Upcoming SlideShare
×

# Week 15 PowerPoint

750 views

Published on

1 Like
Statistics
Notes
• Full Name
Comment goes here.

Are you sure you want to Yes No
• Be the first to comment

Views
Total views
750
On SlideShare
0
From Embeds
0
Number of Embeds
26
Actions
Shares
0
4
0
Likes
1
Embeds 0
No embeds

No notes for slide

### Week 15 PowerPoint

1. 1. Copyright ©2005 Brooks/Cole, a division of Thomson Learning, Inc. The Diversity of Samples from the Same Population Chapter 19
2. 2. Copyright ©2005 Brooks/Cole, a division of Thomson Learning, Inc. 2 19.1 Setting the Stage Working Backward from Samples to Populations • Start with question about population. • Collect a sample from the population, measure variable. • Answer question of interest for sample. • With statistics, determine how close such an answer, based on a sample, would tend to be from the actual answer for the population.Understanding Dissimilarity among Samples • Suppose most samples are likely to provide an answer that is within 10% of the population answer. • Then the population answer is expected to be within 10% of whatever value the sample gave. • So, can make a good guess about the population value.
3. 3. Copyright ©2005 Brooks/Cole, a division of Thomson Learning, Inc. 3 19.2 What to Expect of Sample Proportions 40% of population carry a certain gene Do Not Carry Gene = , Do Carry Gene = X A slice of the population:
4. 4. Copyright ©2005 Brooks/Cole, a division of Thomson Learning, Inc. 4 Sample 1: Proportion with gene = 12/25 = 0.48 = 48% Sample 2: Proportion with gene = 9/25 = 0.36 = 36% Sample 3: Proportion with gene = 10/25 = 0.40 = 40% Sample 4: Proportion with gene = 7/25 = 0.28 = 28% Possible Samples • Each sample gave a different answer. • Sample answer may or may not match population answer.
5. 5. Copyright ©2005 Brooks/Cole, a division of Thomson Learning, Inc. 5 1. There exists an actual population with fixed proportion who have a certain trait. Or There exists a repeatable situation for which a certain outcome is likely to occur with fixed probability. 2. Random sample selected from population (so probability of observing the trait is same for each sample unit). Or Situation repeated numerous times, with outcome each time independent of all other times. 3. Size of sample or number of repetitions is relatively large – large enough to see at least 5 of each of the two possible responses. Conditions for Rule for Sample Proportions
6. 6. Copyright ©2005 Brooks/Cole, a division of Thomson Learning, Inc. 6 Example 1: Election Polls Pollster wants to estimate proportion of voters who favor a certain candidate. Voters are the population units, and favoring candidate is opinion of interest. Example 2: Television Ratings TV rating firm wants to estimate proportion of households with television sets tuned to a certain television program. Collection of all households with television sets makes up the population, and being tuned to program is trait of interest.
7. 7. Copyright ©2005 Brooks/Cole, a division of Thomson Learning, Inc. 7 Example 3: Consumer Preferences Manufacturer of soft drinks wants to know what proportion of consumers prefers new mixture of ingredients compared with old recipe. Population consists of all consumers, and response of interest is preference of new formula over old one. Example 4: Testing ESP Researcher wants to know the probability that people can successfully guess which of 5 symbols is on a hidden card. Each symbol is equally likely. Repeatable situation is a guess, and response of interest is successful guess. Is the probability of correct guess higher than 20%?
8. 8. Copyright ©2005 Brooks/Cole, a division of Thomson Learning, Inc. 8 If numerous samples or repetitions of the same size are taken, the frequency curve made from proportions from various samples will be approximately bell-shaped. Mean will be true proportion from the population. Standard deviation will be: (true proportion)(1 – true proportion) sample size Defining the Rule for Sample Proportions
9. 9. Copyright ©2005 Brooks/Cole, a division of Thomson Learning, Inc. 9 Example 5: Using Rule for Sample Proportions Suppose 40% of all voters in U.S. favor candidate X. Pollsters take a sample of 2400 people. What sample proportion would be expected to favor candidate X? The sample proportion could be anything from a bell- shaped curve with mean 0.40 and standard deviation: (0.40)(1 – 0.40) = 0.01 2400 • 68% chance sample proportion is between 39% and 41% • 95% chance sample proportion is between 38% and 42% • almost certain sample proportion is between 37% and 43% For our sample of 2400 people:
10. 10. Copyright ©2005 Brooks/Cole, a division of Thomson Learning, Inc. 10 19.3 What to Expect of Sample Means • Want to estimate average weight loss for all who attend national weight-loss clinic for 10 weeks. • Unknown to us, population mean weight loss is 8 pounds and standard deviation is 5 pounds. • If weight losses are approximately bell-shaped, 95% of individual weight losses will fall between –2 (a gain of 2 pounds) and 18 pounds lost.
11. 11. Copyright ©2005 Brooks/Cole, a division of Thomson Learning, Inc. 11 Results: Sample 1: Mean = 8.32 pounds, std dev = 4.74 pounds Sample 2: Mean = 6.76 pounds, std dev = 4.73 pounds Sample 3: Mean = 8.48 pounds, std dev = 5.27 pounds Sample 4: Mean = 7.16 pounds, std dev = 5.93 pounds Possible Samples • Each sample gave a different sample mean, but close to 8. • Sample standard deviation also close to 5 pounds. Sample 1: 1,1,2,3,4,4,4,5,6,7,7,7,8,8,9,9,11,11,13,13,14,14,15,16,16 Sample 2: –2, 2,0,0,3,4,4,4,5,5,6,6,8,8,9,9,9,9,9,10,11,12,13,13,16 Sample 3: –4,–4,2,3,4,5,7,8,8,9,9,9,9,9,10,10,11,11,11,12,12,13,14,16,18 Sample 4: –3,–3,–2,0,1,2,2,4,4,5,7,7,9,9,10,10,10,11,11,12,12,14,14,14,19
12. 12. Copyright ©2005 Brooks/Cole, a division of Thomson Learning, Inc. 12 1. Population of measurements is bell-shaped, and a random sample of any size is measured. OR 2. Population of measurements of interest is not bell-shaped, but a large random sample is measured. Sample of size 30 is considered “large,” but if there are extreme outliers, better to have a larger sample. Conditions for Rule for Sample Means
13. 13. Copyright ©2005 Brooks/Cole, a division of Thomson Learning, Inc. 13 Example 6: Average Weight Loss Weight-loss clinic interested in average weight loss for participants in its program. Weight losses assumed to be bell-shaped, so Rule applies for any sample size. Population is all current and potential clients, and measurement is weight loss. Example 7: Average Age at Death Researcher is interested in average age at which left-handed adults die, assuming they have lived to be at least 50. Ages at death not bell-shaped, so need at least 30 such ages at death. Population is all left-handed people who live to be at least 50 years old. The measurement is age at death.
14. 14. Copyright ©2005 Brooks/Cole, a division of Thomson Learning, Inc. 14 If numerous samples or repetitions of the same size are taken, the frequency curve of means from various samples will be approximately bell-shaped. Mean will be same as mean for the population. Standard deviation will be: population standard deviation sample size Defining the Rule for Sample Means
15. 15. Copyright ©2005 Brooks/Cole, a division of Thomson Learning, Inc. 15 Example 9: Using Rule for Sample Means Weight-loss example, population mean and standard deviation were 8 pounds and 5 pounds, respectively, and we were taking random samples of size 25. Potential sample means represented by a bell-shaped curve with mean of 8 pounds and standard deviation: 5 = 1 pound 25 • 68% chance sample mean is between 7 and 9 pounds • 95% chance sample mean is between 6 and 10 pounds • almost certain sample mean is between 5 and 11 pounds For our sample of 25 people:
16. 16. Copyright ©2005 Brooks/Cole, a division of Thomson Learning, Inc. 16 Increasing the Size of the Sample Weight-loss example: suppose a sample of 100 people instead of 25 was taken. Potential sample means still represented by a bell-shaped curve with mean of 8 pounds but standard deviation: 5 = 0.5 pounds 100 • 68% chance sample mean is between 7.5 and 8.5 pounds • 95% chance sample mean is between 7 and 9 pounds • almost certain sample mean is between 6.5 and 9.5 pounds For our sample of 100 people: Larger samples tend to result in more accurate estimates of population values than do smaller samples.
17. 17. Copyright ©2005 Brooks/Cole, a division of Thomson Learning, Inc. 17 19.4 What to Expect in Other Situations • So far two common situations – (1) want to know what proportion of a population fall into one category of a categorical variable, (2) want to know the mean of a population for a measurement variable. • Many other situations and similar rules apply to most other situations
18. 18. Copyright ©2005 Brooks/Cole, a division of Thomson Learning, Inc. 18 • Confidence Intervals Interval of values the researcher is fairly sure covers the true value for the population. • Hypothesis Testing Uses sample data to attempt to reject the hypothesis that nothing interesting is happening—that is, to reject the notion that chance alone can explain the sample results. Two Basic Statistical Techniques
19. 19. Copyright ©2005 Brooks/Cole, a division of Thomson Learning, Inc. 19 Case Study 19.1: Do Americans Really Vote When They Say They Do? Reported in Time magazine (Nov 28, 1994): • Telephone poll of 800 adults (2 days after election) – 56% reported they had voted. • Committee for Study of American Electorate stated only 39% of American adults had voted. Could it be the results of poll simply reflected a sample that, by chance, voted with greater frequency than general population?
20. 20. Copyright ©2005 Brooks/Cole, a division of Thomson Learning, Inc. 20 Case Study 19.1: Do Americans Really Vote When They Say They Do? Suppose only 39% of American adults voted. We can expect sample proportions to be represented by a bell- shaped curve with mean 0.39 and standard deviation: (0.39)(1 – 0.39) = 0.017 800 For our sample of 800 adults, we can be almost certain to see a sample proportion between 33.9% and 44.1%. The reported 56% is far above 44.1%. The standard score for 56% is: (0.56 – 0.39)/0.017 = 10. Virtually impossible to see a standard score of 10 or more.
21. 21. Copyright ©2005 Brooks/Cole, a division of Thomson Learning, Inc. 21 For Those Who Like Formulas
22. 22. Copyright ©2005 Brooks/Cole, a division of Thomson Learning, Inc. Estimating Proportions with Confidence Chapter 20
23. 23. Copyright ©2005 Brooks/Cole, a division of Thomson Learning, Inc. 23 20.1 Confidence Intervals Confidence Interval: an interval of values computed from sample data that is almost sure to cover the true population number. • Most common level of confidence used is 95%. So willing to take a 5% risk that the interval does not actually cover the true value. • We never know for sure whether any one given confidence interval covers the truth. However, … • In long run, 95% of all confidence intervals tagged with 95% confidence will be correct and 5% of them will be wrong.
24. 24. Copyright ©2005 Brooks/Cole, a division of Thomson Learning, Inc. 24 20.2 Three Examples of Confidence Intervals from the Media Most polls report a margin of error along with the proportion of the sample that had each opinion. Formula for a 95% confidence interval: sample proportion ± margin of error
25. 25. Copyright ©2005 Brooks/Cole, a division of Thomson Learning, Inc. 25 Example 1: A Public Opinion Poll Poll reported in The Sacramento Bee (Nov 19, 2003, p. A20) 54% of respondents agreed that gay and lesbian couples could be good parents. Report also provided … Source: Pew Research Center for the People and the Press survey of 1,515 U.S. adults, Oct 15-19; margin of error 3 percentage points. What proportion of entire adult population at that time would agree that gay and lesbian couples could be good parents? sample proportion ± margin of error 59% ± 5% 54% to 64% A 95% confidence interval for that population proportion: Interval resides above 50% => a majority of adult Americans in 2003 believed such couples can be good parents.
26. 26. Copyright ©2005 Brooks/Cole, a division of Thomson Learning, Inc. 26 Example 2: Number of AIDS Cases in U.S. Story reported in the Davis (CA) Enterprise (14 December 1993, p. A5) For the first time, survey data is now available on a randomly chosen cross section of Americans. Conducted by the National Center for Health Statistics, it concludes that 550,000 Americans are actually infected. Dr. Geraldine McQuillan, who presented the analysis, said the statistical margin of error in the survey suggests that the true figure is probably between 300,000 and just over 1 million people. “The real number may be a little under or a bit over the CDC estimate” [of 1 million], she said, “but it is not 10 million.” Some estimates had been as high as 10 million people, but the CDC had estimated the number at about 1 million. Could the results of this survey rule out the fact that the number of people infected may be as high as 10 million?
27. 27. Copyright ©2005 Brooks/Cole, a division of Thomson Learning, Inc. 27 Example 3: The Debate Over Passive Smoking Source: Wall Street Journal (July 28, 1993, pp. B-1, B-4) U.S. EPA says there is a 90% probability that the risk of lung cancer for passive smokers is somewhere between 4% and 35% higher than for those who aren’t exposed to environmental smoke. To statisticians, this calculation is called the “90% confidence interval.” And that, say tobacco-company statisticians, is the rub. “Ninety-nine percent of all epidemiological studies use a 95% confidence interval,” says Gio B. Gori, director of the Health Policy Center in Bethesda, Md. The EPA believes it is inconceivable that breathing in smoke containing cancer-causing substances could be healthy and any hint in the report that it might be would be meaningless and confusing. (p. B-4) Problem: EPA used 90% because the amount of data available at the time did not allow an extremely accurate estimate of true change in risk of lung cancer for passive smokers. The 95% confidence interval actually went below zero percent, indicating it’s possible passive smoke reduces risk.
28. 28. Copyright ©2005 Brooks/Cole, a division of Thomson Learning, Inc. 28 20.3 Constructing a Confidence Interval for a Proportion Recall Rule for Sample Proportions: If numerous samples or repetitions of same size are taken, the frequency curve made from proportions from various samples will be approximately bell-shaped. Mean will be true proportion from the population. Standard deviation will be: (true proportion)(1 – true proportion) sample size In 95% of all samples, the sample proportion will fall within 2 standard deviations of the mean, which is the true proportion.
29. 29. Copyright ©2005 Brooks/Cole, a division of Thomson Learning, Inc. 29 A Confidence Interval for a Proportion In 95% of all samples, the true proportion will fall within 2 standard deviations of the sample proportion. Problem: Standard deviation uses the unknown “true proportion”. Solution: Substitute the sample proportion for the true proportion in the formula for the standard deviation. A 95% confidence interval for a population proportion: sample proportion ± 2(S.D.) Where S.D. = (true proportion)(1 – true proportion) sample size A technical note: To be exact, we would use 1.96(S.D.) instead of 2(S.D.). However, rounding 1.96 off to 2.0 will not make much difference.
30. 30. Copyright ©2005 Brooks/Cole, a division of Thomson Learning, Inc. 30 Example 4: Wife Taller than the Husband? In a random sample of 200 British couples, the wife was taller than the husband in only 10 couples. • sample proportion = 10/200 = 0.05 or 5% • standard deviation = • confidence interval = .05 ± 2(0.015) = .05 ± .03 or 0.02 to 0.08 (0.05)(1 – 0.05) = 0.015 200 Interpretation: We are 95% confident that of all British couples, between .02 (2%) and .08 (8%) are such that the wife is taller than her husband.
31. 31. Copyright ©2005 Brooks/Cole, a division of Thomson Learning, Inc. 31 Example 5: Experiment in ESP Experiment: Subject tried to guess which of four videos the “sender” was watching in another room. Of the 165 cases, 61 resulted in successful guesses. • sample proportion = 61/165 = 0.37 or 37% • standard deviation = • confidence interval = .37 ± 2(0.038) = .37 ± .08 or 0.29 to 0.45 (0.37)(1 – 0.37) = 0.038 165 Interpretation: We are 95% confident that the probability of a successful guess in this situation is between .29 (29%) and .45 (45%). Notice this interval lies entirely above the 25% value expected by chance.
32. 32. Copyright ©2005 Brooks/Cole, a division of Thomson Learning, Inc. 32 Example 6: Quit Smoking with the Patch Study: Of 120 volunteers randomly assigned to use a nicotine patch, 55 had quit smoking after 8 weeks. • sample proportion = 55/120 = 0.46 or 46% • standard deviation = • confidence interval = .46 ± 2(0.045) = .46 ± .09 or 0.37 to 0.55 (0.46)(1 – 0.46) = 0.045 120 Interpretation: We are 95% confident that between 37% and 55% of smokers treated in this way would quit smoking after 8 weeks. The placebo group confidence interval is 13% to 27%, which does not overlap with the nicotine patch interval.
33. 33. Copyright ©2005 Brooks/Cole, a division of Thomson Learning, Inc. 33 Other Levels of Confidence • 68% confidence interval: sample proportion ± 1(S.D.) • 99.7% confidence interval: sample proportion ± 3(S.D.) • 90% confidence interval: sample proportion ± 1.645(S.D.) • 99% confidence interval: sample proportion ± 2.576(S.D.)
34. 34. Copyright ©2005 Brooks/Cole, a division of Thomson Learning, Inc. 34 How the Margin of Error was Derived Two formulas for a 95% confidence interval: • sample proportion ± 1/√n (from conservative m.e. in Chapter 4) • sample proportion ± 2(S.D.) Two formulas are equivalent when the proportion used in the formula for S.D. is 0.50. Then 2(S.D.) is simply 1/√n , which is our conservative formula for margin of error – called conservative because the true margin of error is actually likely to be smaller.
35. 35. Copyright ©2005 Brooks/Cole, a division of Thomson Learning, Inc. 35 Case Study 20.1: A Winning Confidence Interval Loses in Court • Sears company erroneously collected and paid city sales taxes for sales made to individuals outside the city limits. • Sears took a random sample of sales slips for the period in question to estimate the proportion of all such sales. • 95% confidence interval for true proportion of all sales made to out-of-city customers: .367 ± .03, or .337 to .397. • To estimate amount of tax owed, multiplied percentage by total tax they had paid of \$76,975. Result = \$28,250 with 95% confidence interval from \$25,940 to \$30,559. Source: Gastwirth (1988, p. 495)
36. 36. Copyright ©2005 Brooks/Cole, a division of Thomson Learning, Inc. 36 Case Study 20.1: A Winning Confidence Interval Loses in Court • Judge did not accept use of sampling and required Sears to examine all sales records. • Total they were owed about \$27,586. Sampling method Sears had used provided a fairly accurate estimate of the amount they were owed. • Took Sears 300 person-hours to conduct the sample and 3384 hours to do the full audit. • In fairness, the judge in this case was simply following the law; the sales tax return required a sale-by-sale computation. A well designed sampling audit may yield a more accurate estimate than a less carefully carried out complete audit. Source: Gastwirth (1988, p. 495)
37. 37. Copyright ©2005 Brooks/Cole, a division of Thomson Learning, Inc. 37 For Those Who Like Formulas