Chapter 20 and 21 combined testing hypotheses about proportions 2013

  • 2,134 views
Uploaded on

 

More in: Technology , Education
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads

Views

Total Views
2,134
On Slideshare
0
From Embeds
0
Number of Embeds
0

Actions

Shares
Downloads
25
Comments
0
Likes
2

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide
  • The null hypothesis is H0: p=0.75., HA: p>0.75With a P-value of 0.0001, this is very strong evidence against the null hypothesis. We can reject H0 and conclude that the improved version of the drug gives relief to a higher proportion of patients.
  • 4. The parameter of interest is the proportion, p, of all delinquent customers who will pay their bills. H0: p = 0.30 and HA: p> 0.30.
  • 5. The very low P-value leads us to reject the null hypothesis. There is strong evidence that the video tape is more effective in getting people to start paying their debts than just sending a letter had been.
  • P = proportion of students in districts like this one who drop out.H0: p = 10.3 (or p<=.103)HA: p >.103Phat= .118P0= .103Q0= .897Sd= sqrt((.103)(.897)/1782)= .007Z= (.118-.103)/.007=2.14So Normalcdf(2.14, 99)= .016=1.6%= p-valueThis p value is really low so we are going to reject the null hypothesis and conclude that ______________.
  • P = proportion of car crashes occurring within 5 miles of home.H0: p = 50 (or p<=.50)HA: p >.50STAT, TESTS #5, enter in infoZ= 4.195P = .00000136This p value is extremely small so we reject the null hypothesis and conclude …
  • a) Independence assumption: The Euro spins are independent. One spin is not going to effect the others. (With true independence, it doesn’t make sense to try to check the randomization condition or the 10% condition. These verify our assumption of independence, and we don’t need to do that!)Success/Failure condition: npˆ = 140 and nqˆ = 110 are both greater than 10, so the sample is large enough.Since the conditions are met, we can use a one-proportion z-interval to estimate the proportion of heads in Euro spins.We are 95% confident that the true proportion of heads when a Euro is spun is between 0.498 and 0.622.b) Since 0.50 is within the interval, there is no evidence that the coin in unfair. 50% is a plausible value for the true proportion of heads. (That having been said, I’d want to spin this coin a few hundred more times. It’s close!)c) The significance level is α = 0.05. It’s a two-tail test based on a 95% confidence interval.
  • 7. With a z-score of 0.62, you can’t reject the null hypothesis. The experiment shows no evidence that the wheel is not fair.8. At alpha=0.05, you can’t reject the null hypothesis because 0.30 is contained in the 90% confidence interval- it’s plausible that sending the DVDs is no more effective than just sending letters.9. The confidence interval is from 29% to 45%. The DVD strategy is more expensive and may not be worth it. We can’t distinguish the success rate from 30% given the results of this experiment, but 45% would represent a large improvement. The bank should consider another trial, increasing their sample size to get a narrower confidence interval.
  • 10. A Type I error would mean deciding that the DVD success rate is higher than 30% when it really isn’t. They would adopt a more expensive method for collecting payments that’s no better than the less expensive strategy.11. A Type II error would mean deciding that there’s not enough evidence to say that the DVD strategy works when in fact it does. The bank would fail to discover an effective method for increasing their revenue from delinquent accounts.
  • a) Type II. The filter decided that the message was safe, when in fact it was spam.b) Type I. The filter decided that the message was spam, when in fact it was not.c) This is analogous to lowering alpha. It takes more evidence to classify a message as spam.

Transcript

  • 1. Hypotheses In Statistics, a hypothesis proposes a model for the world and then we look at the data.  If the data are consistent with that model, we have no reason to disbelieve the hypothesis. ◦ Data consistent with the model lend support to the hypothesis, but do not prove it.  But if the facts are inconsistent with the model, we need to make a choice as to whether they are inconsistent enough to disbelieve the model. ◦ If they are inconsistent enough, we can reject the model. 
  • 2. Hypotheses Testing  Think about the logic of jury trials: ◦ To prove someone is guilty, we start by assuming they are innocent. ◦ We retain that hypothesis until the facts make it unlikely beyond a reasonable doubt. ◦ Then, and only then, we reject the hypothesis of innocence and declare the person guilty.
  • 3. Hypotheses (cont.)  The statistical twist is that we can quantify our level of doubt. ◦ We can use the model proposed by our hypothesis to calculate the probability that the event we’ve witnessed could happen. ◦ That’s just the probability we’re looking for—it quantifies exactly how surprised we are to see our results. ◦ This probability is called a P-value.
  • 4. Our Problem Suppose we tossed a coin 100 times and we have obtained 38 heads and 62 tails. Is the coin biased toward tails?  There is no way to say yes or no with 100% certainty.  But we can evaluate the strength of support to the hypothesis that “the coin is biased”. 
  • 5. Hypotheses (cont.)      Null hypothesis- H0 established fact, no change of parameters, a statement that we expect data to contradict (status quo) Alternative hypothesis- HA new conjuncture, change of parameters, your claim, a statement that needs a strong support from data to claim it. Our problem: testing a hypothesis about p = proportion of times it turns tails (in the long run) H0: coin is fair, p = 0.5 (or p ≥ 0.5) HA: coin is biased, p > 0.5
  • 6. Ex: A statistics professor wants to see if more than 80% of her students enjoyed taking her class. At the end of the term, she takes a random sample of students from her large class and asks, in an anonymous survey, if the students enjoyed taking her class. Which set of hypotheses should she test? A. H0: p < 0.80 HA: p > 0.80 B. H0: p = 0.80 HA: p > 0.80 C. H0: p > 0.80 HA: p = 0.80 D. H0: p = 0.80 HA: p < 0.80
  • 7. Ex: An online catalog company wants on-time delivery for 90% of the orders they ship. They have been shipping orders via UPS and FedEx but will switch to a new, cheaper delivery service (ShipFast) unless there is evidence that this service cannot meet the 90% on-time goal. As a test the company sends a random sample of orders via ShipFast, and then makes follow-up phone calls to see if these orders arrived on time. Which hypotheses should they test? A. H0: p < 0.90 HA: p > 0.90 B. H0: p = 0.90 HA: p > 0.90 C. H0: p > 0.90 HA: p = 0.90 D. H0: p = 0.90 HA: p < 0.90
  • 8. Hypotheses (cont.)    When the data are consistent with the model from the null hypothesis, the P-value is high and we are unable to reject the null hypothesis. ◦ In that case, we have to “retain” the null hypothesis we started with. ◦ We can’t claim to have proved it; instead we “fail to reject the null hypothesis” when the data are consistent with the null hypothesis model and in line with what we would expect from natural sampling variability. If the P-value is low enough, we’ll “reject the null hypothesis,” since what we observed would be very unlikely were the null model true. Assume that the null hypothesis Ho is true and uphold it, unless data strongly speaks against it.
  • 9. Testing Hypotheses    The null hypothesis, which we denote H0, specifies a population model parameter of interest and proposes a value for that parameter. We want to compare our data to what we would expect given that H0 is true. ◦ We can do this by finding out how many standard deviations away from the proposed value we are. We then ask how likely it is to get results like we did if the null hypothesis were true.
  • 10. The Reasoning of Hypothesis Testing 1. Hypotheses ◦ The null hypothesis: To perform a hypothesis test, we must first translate our question of interest into a statement about model parameters.  ◦ In general, we have H0: parameter = hypothesized value. The alternative hypothesis: The alternative hypothesis, HA, contains the values of the parameter we accept if we reject the null.
  • 11. The Reasoning of Hypothesis Testing (cont.) 2. Model ◦ The test about proportions is called a one-proportion z-test.
  • 12. One-Proportion z-Test  The conditions for the one-proportion z-test are the same as for the one proportion z-interval. We test the hypothesis H0: p = p0 using the statistic z where SD p ˆ ˆ p p0 ˆ SD p p0 q0 n  When the conditions are met and the null hypothesis is true, this statistic follows the standard Normal model, so we can use that model to obtain a P-value.
  • 13. The Reasoning of Hypothesis Testing (cont.) 3. Mechanics ◦ ◦ ◦ Under “mechanics” we place the actual calculation of our test statistic from the data. Different tests will have different formulas and different test statistics. Usually, the mechanics are handled by a statistics program or calculator, but it’s good to know the formulas.
  • 14. The Reasoning of Hypothesis Testing (cont.) 3. Mechanics ◦ If the difference between what we have observed and what is expected under the null model H0 assumption is statistically significant (large enough) then we reject H0 in favor of HA.
  • 15. Our Coin Problem where and p0 is the H0 value of the parameter, in our case p0=0.5. 
  • 16. The Reasoning of Hypothesis Testing (cont.) 3. Mechanics continued ◦ The ultimate goal of the calculation is to obtain a P-value.    The P-value is the probability that the observed statistic value (or an even more extreme value) could occur if the null model were correct. If the P-value is small enough, we’ll reject the null hypothesis. Note: The P-value is a conditional probability—it’s the probability that the observed results could have happened if the null hypothesis is true.
  • 17. The Reasoning of Hypothesis Testing P-value  The probability that the test statistics takes the observed or more extreme value, when the null hypothesis H0 is true.  Our Problem: P-value = P(z > 2.4)= .0082 For a fair coin the probability of seeing 62 or more tails in 100 tosses is less than 0.01 (1%). The smaller the p-value, the stronger evidence against H0 (that is in favor of HA). So we reject the null hypothesis that this is a fair coin and support the alternative that it is biased towards tails. 
  • 18. Just Checking 1. An allergy drug has been tested and found to give relief to 75% of the patients in a large clinical trial. Now the scientists want to see if the new improved version works even better. What would the null hypothesis and alternative hypothesis be? 2. The new drug is tested and the P-value is 0.0001. What would you conclude about the new drug?
  • 19. P-value info (Ch 21)  We can use an alpha level or to set a threshold on our P-value. ◦ Alpha level is also called the significance level.    If our P-value is less than our alpha level, we will reject the null hypothesis. If our P-value is greater than our alpha level, we have to fail to reject the null hypothesis. We can define a “rare event” arbitrarily by setting a threshold for our P-value.  We would then say that the results are statistically significant.  Alpha levels are represented using the symbol α.  Typically we use α = 0.1, 0.05, or 0.01.  When in doubt, we use α = 0.05.  Partially depends on importance of claim being made. ◦ The more important the claim or higher the stakes, the higher an alpha level you would use.
  • 20. Statistically Significant (Ch 21)    When we get a P-value below our alpha level (let’s assume 0.05), we can say “we reject the null hypothesis at the 5% level of significance”. Sometimes, statistical significance doesn’t mean the difference is important in the context of the situation. On the other hand, sometimes a significant difference may turn out to not be statistically significant. ◦ Sometimes a larger sample size can fix this.
  • 21. Statistically Significant (Ch 21)     It may make you uncomfortable to reject/fail to reject. If your P-value falls just slightly above your alpha level, you’re not allowed to reject the null hypothesis. (fail to reject the null) Yet a P-value just barely below the alpha level leads to rejection. When you decide to declare a verdict, it is a good idea to report the P-value as an indication of the strength of the evidence.
  • 22. The Reasoning of Hypothesis Testing (cont.) 4. Conclusion/Decision The conclusion/decision in a hypothesis test is always a statement about the null hypothesis. The conclusion must state either ◦ ◦   ◦ Reject H0 Fail to reject H0 (uphold H0) And, as always, the conclusion should be stated in context.
  • 23. The Reasoning of Hypothesis Testing (cont.) 4. Conclusion ◦ ◦ Your conclusion about the null hypothesis should never be the end of a testing procedure. Often there are actions to take or policies to change.
  • 24. Alternative Hypotheses  There are three possible alternative hypotheses:  HA: parameter < hypothesized value  HA: parameter ≠ hypothesized value  HA: parameter > hypothesized value
  • 25. Alternative Hypotheses (cont.)   HA: parameter ≠ value is known as a two-sided alternative because we are equally interested in deviations on either side of the null hypothesis value. For two-sided alternatives, the P-value is the probability of deviating in either direction from the null hypothesis value.
  • 26. Alternative Hypotheses (cont.) The other two alternative hypotheses are called one-sided alternatives.  A one-sided alternative focuses on deviations from the null hypothesis value in only one direction.  Thus, the P-value for one-sided alternatives is the probability of deviating only in the direction of the alternative away from the null hypothesis value. 
  • 27. Alternative Hypotheses (cont.)
  • 28. Critical Values for Hypothesis Testing   Just like we used critical values in confidence intervals, we will use them with alpha levels. If our z-score is more extreme than the critical value, then we will have a P-value smaller than our alpha level.
  • 29. Just Checking cont. 3. A bank is testing a new method for getting delinquent customers to pay their past-due credit card bills. The standard way was to send a letter (costing about $0.40 each) asking the customer to pay. That worked 30% of the time. They want to test a new method that involves sending a video tape to the customer encouraging them to contact the bank and set up a payment plan. Developing and sending the video costs about $10.00 per customer. What is the parameter of interest? What are the null and alternative hypotheses?
  • 30. Just Checking cont. 4. The bank sets up an experiment to test the effectiveness of the video tape. They mail it out to several randomly selected delinquent customers and keep track of how many actually do contact the bank to arrange payments. The bank’s statistician calculates a P-value of 0.003. What does this P-value suggest about the video tape?
  • 31. 5. Some people are concerned that new tougher standards and high-stakes tests may drive up the high school dropout rate. The National Center for Education Statistics reported that the high school dropout rate for the year 2004 was 10.3%. One school district, whose dropout rate has always been very close to the national average, reports that 210 of their 1782 students dropped out last year. Is their experience evidence that the dropout rate is increasing?
  • 32. 6. In a study of 11,000 car crashes, it was found that 5720 of them occurred within 5 miles of home. Is this significant evidence to show that more than 50% of car crashes occur within 5 miles of home?
  • 33. Confidence Intervals and Hypothesis Tests    Confidence intervals and hypothesis tests are built on the same calculations with the same assumptions and conditions. Our conclusion about the null should be consistent with whether or not the proportion in the claim falls within the confidence interval. A 95% confidence interval corresponds with a two-sided hypothesis test with α = 5%.
  • 34. Confidence Levels and Hypothesis Testing   A confidence interval with a confidence level of C% corresponds to a two-sided hypothesis test with an α level of 100 – C%. A confidence interval with a confidence level of C% corresponds to a one-sided hypothesis test with an α level of ½(100 – C)%. ◦ Think about it: A one-sided test with α = 5% corresponds to a confidence interval with 5% on each side, giving 90% confidence level.
  • 35. Example: Is Euro a fair coin? Soon after the Euro was introduced as currency in Europe, it was widely reported that someone had spun a Euro 250 times and gotten heads 140 times. a. Estimate the true proportion of heads using a 95% confidence interval. (remember to check conditions) CI :  p z *  pq n (.56)(.44) .56 1.96 250 .56 .062 CI : (.488,.622) b. Does your confidence interval provide evidence that the coin is unfair when spun? Explain. c. What is the significance level?
  • 36. Just Checking 7. An experiment to test the fairness of a roulette wheel gives a z-score of 0.62. What would you conclude? 8. We encountered a bank that wondered if it could get more customers to make payments on delinquent balances by sending them a DVD urging them to set up a payment plan. Well, the bank just got back the results on their tests of this strategy. A 90% confidence interval for the success rate is (0.29, 0.45). Their old send-a-letter method had worked 30% of the time. Can you reject the null hypothesis that the proportion is still 30% at =0.05? Explain. 9. Given the confidence interval the bank found in their trial of DVDs, what would you recommend that they do? Should they scrap the DVD strategy?
  • 37. Errors in Hypothesis Testing Even with our careful analysis and lots of evidence, we can make an incorrect decision.  Two ways we can make mistakes with hypothesis testing: Type I: null hypothesis is true, but we reject it. (HOT) Type II: null hypothesis is false, but we fail to reject it. (HAT)   Which error is more serious depends on the situation.
  • 38. Type I Error- HOT  In medical terms, this would be a false positive. ◦ A healthy person is diagnosed with a disease incorrectly.  In jury terms, this would mean an innocent person is convicted.
  • 39. Type II Error- HAT  In medical terms, this would be a false negative. ◦ An infected person goes undiagnosed.  In jury terms, this would mean an guilty person is not convicted.
  • 40. Type I and II Errors
  • 41. Just Checking continued 10. Remember our bank? It is looking for evidence that the costlier DVD strategy produces a higher success rate than the letters it has been sending. Explain what a Type I error is in this context and what would the consequences would be to the bank? 11. What’s a Type II error in the bank experiment context, and what would the consequences be?
  • 42. Example: Spam Filter 12. Suppose a spam filter uses a point system to score each email based on sender, subject, and keywords. The higher the point total, the more likely that the message is spam. We can think of the filter’s decision as a hypothesis test. The null hypothesis is that the email is a real message. A high point score would be evidence that it is junk and will therefore reject the null hypothesis and classify it as spam. a. When the filter allows spam to slip through into your inbox, which kind of error is this? b. Which kind of error is it when a real message gets classified as junk? c. If the filter has a default cutoff score of 50 , but you reset it to 60, is that analogous to choosing a higher or lower value of α for a hypothesis test?
  • 43. Probability of Errors    To reject H0, the P-value must fall below . When H0 is true that happens exactly with probability so when you choose the level , you are setting the probability of a Type I error to . When H0 is false and we fail to reject it, we have made a Type II error. We assign the letter to the probability of this mistake.
  • 44. Reducing Errors   We can reduce α to lower the chance of a Type I Error, but then that will have the effect of raising β. The only way to really reduce both Type I and Type II errors simultaneously is to increase our sample size, which will reduce our standard deviations.
  • 45. What Can Go Wrong? Don’t interpret the P-value as the probability that H0 is true.  Don’t believe too strongly in arbitrary alpha levels.  Don’t confuse practical and statistical significance.  Don’t forget that in spite of all your care, you might make a wrong decision. 