Your SlideShare is downloading. ×
Statr session 17 and 18
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×

Saving this for later?

Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime - even offline.

Text the download link to your phone

Standard text messaging rates apply

Statr session 17 and 18

174
views

Published on

Praxis Weekend Business Analytics

Praxis Weekend Business Analytics

Published in: Education, Technology, Business

0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
174
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
16
Comments
0
Likes
1
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Learning Objectives • Understand how to test a hypothesis about a single population parameter: – Proportion (using z-statistic) – Variance (using c2-statistic) • Calculate the probability of Type II error when failing to reject the null hypothesis • Test hypotheses and construct confidence intervals about the difference in two population means using the Z statistic. • Test hypotheses and construct confidence intervals about the difference in two population means using the t statistic.
  • 2. Learning Objectives (continued) • Test hypotheses and construct confidence intervals about the difference in two related populations. • Test hypotheses and construct confidence intervals about the differences in two population proportions.
  • 3. Hypothesis Test of p Suppose a company held 26% of market share for several years. Due to a massive marketing effort and improved product quality, company officials believe that the market share has increased, and they want to prove it statistically. In a random sample of 140 users, 48 used their product. Does this present evidence that their market share has increased? Test it at the 5% level of significance.
  • 4. Hypothesis Test of p 1 – Establish the null and alternative hypotheses H0: p = 0.26 vs. Ha: p > 0.26 2 – Determine the appropriate statistical test: • Z-test for proportions: • ˆ z  p p pq n Appropriate if the following two conditions are met:  The sample was randomly selected from the population  np>= 5 and nq >=5. For our data, 140(0.26) = 36.4 > 5 and 140 (0.74) = 103.6 > 5, so this condition is met. 3 – Set a, the Type I error rate / significance level Choose the common value of a = 0.05
  • 5. Hypothesis Test of p ˆ p  p  0.3430.26  0.083  2.24 z  pq (0.26)(10.26) 0.037 n 140
  • 6. A small business has 37 employees. Because of the uncertain demand for its product, the company usually pays overtime on any given week. The company assumed that about 50 total hours of overtime per week is required and that the variance on this figure is about 25. Company officials want to know whether the variance of overtime hours has changed. The data below are a random sample of 16 weeks of overtime in hours per week. Assume hours of overtime are normally distributed. Use these data to test the null hypothesis that the variance of overtime data is 25. 57 56 52 44 46 53 44 44 48 51 55 48 63 53 51 50
  • 7. Step 1: Hypothesize H0: 2 = 25 Ha: 2  25 Step 2: Variances follow a chi-squared distribution with n -1 degrees of freedom, underlying population has a normal distribution. The test statistic is: c2 n 1 (n 1)s2  2
  • 8. • Step 3: Choose a = 0.10 (so a/2 = 0.05) • Step 4: The degrees of freedom are 16 – 1 = 15. The lower and upper critical chi-square values are c2(1 – 0.05), 15 = c2 0.95, 15 = 7.3 and c2 0.05, 15 = 25.0 • Step 5: The data are listed in the text. • Step 6: The sample variance is s2 = 28.1. The observed chi-square value is calculated as c2 = (n-1)s2 / 2 = (15) 28.1 / 25 = 16.9
  • 9. • Step 7: The observed chi-square value is in the nonrejection region because c2 0.95, 15 = 7.3 < c2observed = 16.9 < c2 0.05, 15 = 25.0 • Step 8: This result indicates to the company managers that the variance of weekly overtime hours is about what they expected.
  • 10. df = 7 .05 .95 .05 0 7.3 25.0
  • 11. Solving for Type II Errors • When the null hypothesis is not rejected, then either a correct decision is made or an incorrect decision is made. • If an incorrect decision is made, that is, if the null hypothesis is not rejected when it is false, then a Type II error has occurred.
  • 12. Solving for Type II Errors (Soft Drink) • Suppose a test is conducted on the following hypotheses about the amount of liquid in a 12 ounce soft drink can: H0: = 12 ounces vs. Ha: < 12 ounces, the sample size is 60, and the sample mean is 11.985, and population standard deviation () is assumed to be 0.10. • The first step in determining the probability of a Type II error is to calculate a critical value for the sample mean (or proportion or variance, etc.).
  • 13. Solving for Type II Errors (Soft Drink) xc   zc  / n xc 12 1.645  0.10 / 60 xc 11.979 In testing the null hypothesis by the critical value method, this value is used as the cutoff for the nonrejection region. For any sample mean obtained that is less than 11.979, the null hypothesis is rejected. Any sample mean greater than 11.979, the null hypothesis is not rejected.
  • 14. Solving for Type II Errors (Soft Drink) The Type II Error rate (b) varies with different values of the true parameter. For example, if the true mean as 11.99, the corresponding z-value for b is xc   1 z1  / n 11.979 11.99 z1  0.10 / 60 z1  0.85
  • 15. Solving for Type II Errors (Soft Drink) • Recall that a Type II error is made when you fail to reject when you should. Thus, you want to calculate Px  xc |   1   Px  11 .979 |   11 .99   P( z  0.85 )  .802 • Thus, there is an 80.2% chance of committing a Type II error if the alternative mean is 11.99. That is quite a high chance of being wrong (but then again 11.99 is so close to 12, so you would need a lot of data to show that those are statistically different). • Note, you only need to be concerned about type II errors since you would have failed to reject the null hypothesis [t = -1.16 > tc = -1.645 (for a=0.05)]
  • 16. Operating Characteristic and Power Curve • Because the probability of committing a Type II error changes for each different value of the alternative parameter, it is common to examine a series of possible alternative values. • The power of a test is the probability of rejecting the null hypothesis when it is false. • Power = 1 - b.
  • 17. • Calculating two sample means and using the difference in the two sample means is used to test the difference in the population • The Central Limit theorem states that the difference in two sample means is normally distributed for large sample sizes (both n1 and n2 > 30) regardless of the shape of the population
  • 18. Hypothesis Testing for Differences Between Means: The Wage Example As an example, we want to conduct a hypothesis test to determine whether the average annual wage for an advertising manager is different from the average annual wage of an auditing manager. Because we are testing to determine whether the means are different, it might seem logical that the null and alternative hypotheses would be Ho: μ1 = μ2 Ha: μ1 ≠ μ2 where advertising managers are Population 1 and auditing managers are Population 2.
  • 19. Hypothesis Testing for Differences Between Means: The Wage Example H 0 : 1  2 H a : 1  2 a =0.05, a/2 = 0.025, z0.025 = 1.96 The two hypotheses can also be expressed as: H0 : 1  2  0 H a : 1  2  0 Analysis is testing whether there is a difference in the average wage. This is a two tailed test.
  • 20. Hypothesis Testing for Differences Between Means: The Wage Example Rejection Region Rejection Region Rejection Region a 2 Rejection Region a  .025 2  .025 Non Rejection Region Z c  1.96 0 Z c  1.96 Critical Values If z < - 1.96 or z > 1.96, reject Ho. If - 1.96  z  1.96, do not reject Ho.
  • 21. Hypothesis Testing for Differences Between Means: The Wage Example n  32 x  70.700 1 1   n x  16.253 1  264.164 2 1 2    34 2 2 2 2  62.187  12.900  166.411
  • 22. Hypothesis Testing for Differences Between Means: The Wage Example z  (70.700 62.187)  (0)  2.35 264.160  166.411 32 34 Since the observed value of 2.35 is greater than 1.96, reject the null hypothesis. That is, there is a significant difference between the average annual wage of advertising managers and the average annual wage of auditing managers.
  • 23. Hypothesis Testing for Differences Between Means: The Wage Example  0 H :    0 Ho : 1 2 1 Rejection Region Rejection Region 2 a a  .025 2 a  .025 2 Non Rejection Region X 1  X2 Critical Values X X 1 2
  • 24. Hypothesis Testing for Differences Between Means: The Wage Example If z  1.96 or z  1.96, reject H 0 . Rejection Region Rejection Region a 2 z a  .025 2  .025 Non Rejection Region Z c  2.33 If  1.96  z  1.96, do not reject H 0 . 0 Critical Values Z c  2.33  ( x1  x2 )  ( 1   2) 2  12  2  n1 n2 (70.700- 62.187)- (0)  2.35 264.164 166.411  32 34 Since z  2.35  1.96 , reject H 0 .
  • 25. Demonstration Problem 1 A sample of 87 professional working women showed that the average amount paid annually into a private pension fund per person was $3352. The population standard deviation is $1100. A sample of 76 professional working men showed that the average amount paid annually into a private pension fund per person was $5727, with a population standard deviation of $1700. A women’s activist group wants to “prove” that women do not pay as much per year as men into private pension funds. If they use α = .001 and these sample data, will they be able to reject a null hypothesis that women annually pay the same as or more than men into private pension funds? Use the eightstep hypothesis-testing process.
  • 26. Demonstration Problem 1 (Step 1) Rejection Region Ho: 1   2  0 Ha: 1   2  0 a .001 Non Rejection Region Z c  3.08 0 Critical Value
  • 27. Demonstration Problem 1 (Steps 2 -7) Women x1  $3,352  1  $1,100 Rejection Region  3.08 x  x       z   n n 1 Non Rejection Region c  2  $1,700 n2  76 n1  87 a .001 Z Men x2  $5,727 0 Critical Value If z < - 3.08, reject Ho. If z   3.08, do not reject Ho. 2 1 2 2 1 2 1  2 2 3352  5727  0  10.42 2 2 1100  1700 87 76 Since z = - 10.42 < - 3.08, reject Ho.
  • 28. Demonstration Problem 1 (Step 8 – Business implications) • The evidence is substantial that women, on average, pay less than men into private pension funds annually. • The probability of obtaining an observed z value of -10.42 is virtually zero.
  • 29. Confidence Interval • Sometimes the solution(s) is/are to take a random sample from each of the two populations and study the difference in the two samples. • Formula for confidence interval to estimate (µ1 - µ2). • Designating a group as group one, and another as group two is an arbitrary decision.
  • 30. Demonstration Problem 2 A consumer test group wants to determine the difference in gasoline mileage of cars using regular unleaded gas and cars using premium unleaded gas. Researchers for the group divided a fleet of 100 cars of the same make in half and tested each car on one tank of gas. Fifty of the cars were filled with regular unleaded gas and 50 were filled with premium unleaded gas. The sample average for the regular gasoline group was 21.45 miles per gallon (mpg), and the sample average for the premium gasoline group was 24.6 mpg. Assume that the population standard deviation of the regular unleaded gas population is 3.46 mpg, and that the population standard deviation of the premium unleaded gas population is 2.99 mpg. Construct a 95% confidence interval to estimate the difference in the mean gas mileage between the cars using regular gasoline and the cars using premium gasoline.
  • 31. Demonstration Problem 2 Regular Premium n n  50 x  21.45 x   3.46   50 1 2 1 1 2 95% Confidence z = 1.96  24.6 2  2.99 x  x  z    2 1 1 2 2 2   n n 2 1 2 n n 3.46  2.99      21.45  24.6  1.96 50 50 1 21.45  24.6  1.96      x1  x 2   z 2 2 1 2 2 1 2 1 2  4.42      1.88 1 2 2 3.46 2.992  50 50 2
  • 32. Hypothesis Test for Two Populations with population variances unknown • Hypothesis test - compares the means of two samples to see if there is a difference in the two population means from which the sample comes. This is used when σ2 is unknown and samples are independent. • Assumes that the measurement is normally distributed.
  • 33. Hypothesis Test for Two Populations with population variances unknown If σ is unknown, it can be estimated by pooling the two sample variances and computing a pooled sample standard deviation
  • 34. t Statistic to test the Difference in Means: 12 = 22 t ( x1  x2 )  ( 1   2 ) 2 2 s1 (n1  1)  s2 (n2  1) n1  n2  2 1 1  n1 n2
  • 35. Hernandez Manufacturing Company At the Hernandez Manufacturing Company, an application of this test arises. New employees are expected to attend a three-day seminar to learn about the company. At the end of the seminar, they are tested to measure their knowledge about the company. The traditional training method has been lecture and a question-and-answer session. Management decided to experiment with a different training procedure, which processes new employees in two days by using DVDs and having no questionand-answer session.
  • 36. Hernandez Manufacturing Company If this procedure works, it could save the company thousands of dollars over a period of several years. However, there is some concern about the effectiveness of the two-day method, and company managers would like to know whether there is any difference in the effectiveness of the two training methods. The managers randomly select a group of 15 from the old method (Method A) and a group of 12 from the proposed method (Method B) and test all on a set of questions. The following are the scores of the groups. Training Method A 56 50 52 44 47 47 53 45 42 51 42 43 Training Method B 52 59 54 55 65 48 52 57 64 53 44 53 56 53 57
  • 37. Hernandez Manufacturing Company (Steps 1– 4)   H :   Ho: a 1 0 2 1 2 0 a .05   .025 2 2 df  n1  n2  2  15  12  2  25 t0.25, 25  2.060 If t < - 2.060 or t > 2.060, reject Ho. If - 2.060  t  2.060, do not reject Ho. Rejection Rejection Region Region Rejection Rejection Region Region a .025 2 a .025 2 Non Rejection Region t .025, 25  2.060 0 0 Critical Values Critical Values t . 025, 25  2.060
  • 38. Hernandez Manufacturing Company (Step 5) Training Method A Training Method B 56 51 45 47 52 43 42 53 52 50 42 48 47 44 44 59 57 53 52 56 65 53 55 53 54 64 57 n1  15 n2  12 x1  47.73 x2  56.5 s  19.495 s  18.273 2 1 2 2
  • 39. Hernandez Manufacturing Company (Step 6-7) t ( x1  x2 )  (1   2 ) 2 1 2 2  s s  n1 n2  ( x1  x2 )  (1   2 ) 2 2 s1 (n1  1)  s2 (n2  1) n1  n2  2  47.73  56.50   0 19.495 14   18.27311 15  12  2 1 1  n1 n2 1 1  15 12  5.20 2 2 2  s1 s2  n  n   1 2  df   25 2 2 2 2  s1   s2       n1    n2  n1  1 n2  1 If t < - 2.060 or t > 2.060, reject Ho. If - 2.060  t  2.060, do not reject Ho. Since t = -5.20 < -2.060,reject Ho.
  • 40. Hernandez Manufacturing Company Business Implications (Step 8) • The conclusion is that there is a significant difference in the effectiveness of the training methods. • Given that training method B scores are significantly higher and the fact that the seminar is a day shorter than method A (saving time and money), it makes business sense to adopt method B as the standard training method.
  • 41. Statistical Inferences for Two Related Populations • Dependent samples  Used in before and after studies  After measurement is not independent of the before measurement
  • 42. Hypothesis Testing • Researcher must determine if the two samples are related to each other • The technique for related samples is different from the technique used to analyze independent samples • Matched pairs test requires the two samples to be of the same size
  • 43. Dependent Samples • • • Before and after measurements on the same individual Studies of twins Studies of spouses Individual Before After 1 32 39 2 11 15 3 21 35 4 17 13 5 30 41 6 38 39 7 14 22
  • 44. Hypothesis Testing • The matched pair t test for dependent measures uses the sample difference, d, between individual matched samples as the basic measurement of analysis • An analysis of d converts the problem from a two sample problem to a single sample of differences • The null hypothesis states that the mean population difference is zero • An assumption for the test is that the differences of two populations are normally distributed
  • 45. Hypothesis Testing: Formulas for Dependent Samples d D t sd n df  n  1 n  number of pairs d = sample differencein pairs D = mean population difference st = standard deviation of sample difference d = mean sample difference d d  n sd  ( d  d ) 2 n 1 ( d ) 2 d  n n 1 2 
  • 46. Hypothesis Testing: Degree of Freedom • Analysis of data by this method involves calculating a t value with a critical value obtained from the table • n in the degrees of freedom (n – 1) is the number of matched pairs or scores
  • 47. P/E Ratios for Nine Randomly Selected Companies Suppose a stock market investor is interested in determining whether there is a significant difference in the P/E (price to earnings) ratio for companies from one year to the next. In an effort to study this question, the investor randomly samples nine companies from the Handbook of Common Stocks and records the P/E ratios for each of these companies at the end of year 1 and at the end of year 2.
  • 48. P/E Ratios for Nine Randomly Selected Companies Company Year 1 P/E Ratio Year 2 P/E Ratio 1 8.9 12.7 2 38.1 45.4 3 43.0 10.0 4 34.0 27.2 5 34.5 22.8 6 15.2 24.1 7 20.3 32.3 8 19.9 40.1 9 61.9 106.5
  • 49. Hypothesis Testing for Dependent Samples: P/E Ratios for Nine Companies Ho : D  0 Ha : D  0 a  .01 df  n  1  9  1  8 t.005,8  3.355 If t < - 3.355 or t > 3.355, reject Ho. If - 3.355  t  3.355, do not reject Ho. Rejection Region Rejection Region a .005 2 a .005 2 Non Rejection Region t  3.355 0 Critical Value t  3.355
  • 50. Hypothesis Testing for Dependent Samples: P/E Ratios for Nine Companies Ho : D  0 Ha : D  0 d  5.033 sd  21.599 t  5.033  0  0.70 21.599 9 Since -3.355  t = -0.70  3.355, do not reject Ho
  • 51. Hypothesis Testing for Dependent Samples: P/E Ratios for Nine Companies – Software output
  • 52. Confidence Intervals for Mean Population Difference • Researcher can be interested in estimating the mean difference in two populations for related samples • This requires a confidence interval of D (the mean population difference of two related samples) to be constructed
  • 53. Confidence Interval for Mean Population Difference of Related Samples d t s d  D  d t n df  n  1 s d n
  • 54. Difference in Number of New-House Sales Realtor d 8 11 -3 2 19 30 -11 3 5 6 -1 4 sd  3.27 May 2011 1 d  3.39 May 2010 9 13 -4 5 3 5 -2 6 0 4 -4 7 13 15 -2 8 11 17 -6 9 9 12 -3 10 5 12 -7 11 8 6 2 12 2 5 -3 13 11 10 1 14 14 22 -8 15 7 8 -1 16 12 15 -3 17 6 12 -6 18 10 10 0
  • 55. Confidence Interval for Mean Difference in Number of New-House Sales df  n  1  18  1  17 t .005,17  2.898 d t s d n  D  d t s d n 3.27 3.27  D  3.39  2.898 18 18  3.39  2.23  D  3.39  2.23  3.39  2.898  5.62  D  1.16 The analyst estimates with a 99% level of confidence that the average difference in new-house sales for a real estate company in Indianapolis between 2005 and 2006 is between -5.62 and -1.16 houses.
  • 56. ˆ ˆ ˆ ( p1  p2 )  ( p1  p2 ) p  proportion from sample 1 z ˆ p  proportion from sample 2 p1  q1 p2  q2  n  size of sample 1 n1 n2 n  size of sample 2 1 2 1 2 p  proportion from population 1 p  proportion from population 2 q  1- p q  1- p 1 2 1 1 2 2
  • 57. Statistical Inference about Two Population Proportions: Hypothesis Testing • Because population proportions are unknown, an estimate of the Standard Deviation of the difference in two sample proportions is made by using sample proportions as point of estimates of the population proportion
  • 58. Z Formula to Test the Difference in Population Proportions p ˆ Z  1 P   2   p 1  p  1 1    p  q     n2   n1 x1  x2 n n ˆ ˆ n p n p n n 1  ˆ p 1 2 2 1 q  1 p 1 2 2 2 