Hypothesis testng

1,832 views
1,731 views

Published on

Hypothesis Testing for SPSS

0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
1,832
On SlideShare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
86
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

Hypothesis testng

  1. 1. 1. T-test: Difference in means to test the statistical significance in the difference in means ex. income by gender, the num. of years at work by gender 2. T-test: Difference in proportions to test the statistical significance in the difference in proportions ex. the proportion employed in government jobs by gender 3. Contingency Table/Chi-Square Analysis to test whether all categories contain the same proportion of values or not by comparing expected and actual values. ex. the proportion employed in government jobs by gender Hypothesis Testing
  2. 2. 1. A Research Question 2. The Null Hypothesis usually assumes NO difference 2 tailed-test 3. Select Cases 4. T-test or Contingency/Chi-Square Analysis 5. Interpret Test Results t-score, significance level, confidence interval, likelihood ratio (for Chi-Square Analysis) 6. “Reject” or “Not reject” the null hypothesis Hypothesis Testing Procedure
  3. 3. • Research Question: Are there differences in income between male and female graduates and if so, what factors might explain this difference? 1. Is there a difference in average income between male and female graduates? 2. Is there a significant difference in average length of time on the job, between male and female graduates? 3. Is there a difference in the proportion employed in government jobs between males and females? Hypothesis Testing
  4. 4. Research Question: Is there a difference in average income between male and female graduates? H0: There is NO difference in average income between male and female graduates Note: Limit the data to full-time employees or self- employed with income more than $20,000 and less than $400,000. 1. T-test: Difference in Means
  5. 5. Step 1: Data/Select Cases • Select Data/Select Cases
  6. 6. Data/Select Cases • In a Select Cases dialogue box, you specify logical expressions to select cases. – Select the “If condition is satisfied” option – Click on the If… button
  7. 7. Data/Select Cases Specifying fullself and income range Type logical expression: fullself = 1 & income > 20000 & income < 400000 to limit cases to alumni who work full-time or are self-employed and make more than $20,000 and less than $400,000.
  8. 8. Data/Select Cases
  9. 9. Data/Select Cases
  10. 10. Step 2: Independent T-Test Analyze/Compare Means/Independent-Samples T-Test
  11. 11. Step 2: Independent T-Test income gender(? ?)
  12. 12. Step 2: Independent T-Test Group 1: 1 for Female Group 2: 2 for male Note: The grouping variable can only have two categories.
  13. 13. Step 2: Independent T-Test gender(1 2)
  14. 14. T-test: Results Using the Unequal Variance model, we REJECT H0 and conclude that there is a significant difference in average income between male and female graduates. Group Statistics 128 79868.22 35165.875 3108.254 137 98606.49 47980.995 4099.293 Gender Female Male Income N Mean Std. Deviation Std. Error Mean Independent Samples Test 10.443 .001 -3.605 263 .000 -18738.270 5197.537 -28972.4 -8504.190 -3.642 249.145 .000 -18738.270 5144.458 -28870.4 -8606.100 Equal variances assumed Equal variances not assumed Income F Sig. Levene's Test for Equality of Variances t df Sig. (2-tailed) Mean Difference Std. Error Difference Lower Upper 95% Confidence Interval of the Difference t-test for Equality of Means >-1.96 < 0.05 -18,738 Doesn’t include 0
  15. 15. Possible explanation for the difference in income: Male income is higher because men have been on the job longer than women. Research Question: Is there a difference in average length of time on the job (YEARS) between male and female graduates? H0: There is NO difference in length of time on the job between male and female graduates 1-2. T-test: Difference in Means
  16. 16. Step 2: Independent T-Test Analyze/Compare Means/Independent-Samples T-Test
  17. 17. Step 2: Independent T-Test Years at Current Position [years] gender(1 2)
  18. 18. T-test: Results Using the Unequal Variance model, we REJECT H0 and conclude that there is a significant difference in average length of time on the job between male and female graduates. Group Statistics 128 4.15 4.315 .381 137 5.90 5.764 .492 Gender Female Male Years at Current Position N Mean Std. Deviation Std. Error Mean Independent Samples Test 13.386 .000 -2.786 263 .006 -1.752 .629 -2.991 -.514 -2.813 251.276 .005 -1.752 .623 -2.979 -.525 Equal variances assumed Equal variances not assumed Years at Current Position F Sig. Levene's Test for Equality of Variances t df Sig. (2-tailed) Mean Difference Std. Error Difference Lower Upper 95% Confidence Interval of the Difference t-test for Equality of Means Does not include 0 >-1.96 < 0.05 -1.752
  19. 19. Possible explanation for the difference in income: Male income is higher because more females work for government than males. Research Question: Is there a difference in the proportion employed in government jobs between male and female graduates? H0: There is NO difference in the proportion employed in government jobs between male and female graduates 2. T-test: Difference in Proportions
  20. 20. • Create a new variable GOV that – has the value 1 if the EMPLOYER (1-6) indicates the alumnus works for a government organization. – has the value 0 if the EMPLOYER is not 1-6. 1. Use Transform/Compute to convert the EMPLOYER variable into a new categorical variable GOV. 2. Use Transform/Recode/Into Different Variables to create a new categorical variable GOV. Step 1: Create a new variable (GOV)
  21. 21. OUTPUT: Analyze/Descriptive Statistics/Frequencies Employer 8 2.9 2.9 2.9 12 4.3 4.3 7.2 13 4.7 4.7 11.9 45 16.2 16.2 28.2 17 6.1 6.1 34.3 3 1.1 1.1 35.4 5 1.8 1.8 37.2 5 1.8 1.8 39.0 20 7.2 7.2 46.2 11 4.0 4.0 50.2 47 16.9 17.0 67.1 51 18.3 18.4 85.6 4 1.4 1.4 87.0 25 9.0 9.0 96.0 11 4.0 4.0 100.0 277 99.6 100.0 1 .4 278 100.0 Gov: Federal Gov: State Gov: County Gov: City Gov: Special Agency Gov: Non U.S. Private: Single Person Private: 2-4 Persons Private: 5-19 Persons Private: 20-49 Persons Private: >= 50 Persons Non-Profit (U.S.) International Org. Educational Inst. Other Total Valid SystemMissing Total Frequency Percent Valid Percent Cumulative Percent 7-11 Private Missing Values 1-6 Government
  22. 22. Transform/Recode/ Into Different Variables
  23. 23. Transform/Recode/ Into Different Variables Select the income variable, type “GOV”, click the “Change” button, click the “Old and New Values” button…
  24. 24. Transform/Recode/ Into Different Variables
  25. 25. Transform/Recode/ Into Different Variables
  26. 26. Transform/Recode/ Into Different Variables
  27. 27. Transform/Recode/ Into Different Variables
  28. 28. Transform/Recode/ Into Different Variables Save the data file!!
  29. 29. • Analyze/Descriptive Statistics/Frequencies Step 2: Create a frequency table for GOV Thirty five percent of the graduates employed full time or self-employed and making more than $20,000 and less than $400,000 work in government jobs. Government Job 179 64.4 64.6 64.6 98 35.3 35.4 100.0 277 99.6 100.0 1 .4 278 100.0 No Yes Total Valid SystemMissing Total Frequency Percent Valid Percent Cumulative Percent
  30. 30. Step 2: Independent T-Test Analyze/Compare Means/Independent-Samples T-Test
  31. 31. Step 2: Independent T-Test gov gender(1 2)
  32. 32. T-test: Results Using the Unequal Variance model, we CANNOT REJECT H0 and cannot conclude that there is a significant difference between male and female graduates with respect to the proportion working in the government sector. Group Statistics 127 .3543 .48020 .04261 137 .3650 .48319 .04128 Gender Female Male Government Job N Mean Std. Deviation Std. Error Mean Independent Samples Test .129 .720 -.179 262 .858 -.01063 .05934 -.12748 .10622 -.179 260.726 .858 -.01063 .05933 -.12746 .10619 Equal variances assumed Equal variances not assumed Government Job F Sig. Levene's Test for Equality of Variances t df Sig. (2-tailed) Mean Difference Std. Error Difference Lower Upper 95% Confidence Interval of the Difference t-test for Equality of Means <-1.96 > 0.05 Includes 0
  33. 33. 3. Contingency Table/ Chi-Square Analysis The same question can be analyzed by a contingency table with GOV and GENDER and testing using the Chi-Square statistic. H0: There is NO relationship between employment sector and gender.
  34. 34. Analyze/Descriptive statistics/Crosstabs
  35. 35. Analyze/Descriptive statistics/Crosstabs Counts : Observed Percentages : Row Column Select “gov” for “Row” & “Gender” for “column.”
  36. 36. Contingency table Analyze/Descriptive statistics/Crosstabs Government Job * Gender Crosstabulation 82 87 169 48.5% 51.5% 100.0% 64.6% 63.5% 64.0% 45 50 95 47.4% 52.6% 100.0% 35.4% 36.5% 36.0% 127 137 264 48.1% 51.9% 100.0% 100.0% 100.0% 100.0% Count % within Government Job % within Gender Count % within Government Job % within Gender Count % within Government Job % within Gender No Yes Government Job Total Female Male Gender Total
  37. 37. Contingency table Analyze/Descriptive statistics/Crosstabs Chi-Square value = 0.032 < 3.84 (1.962 = Cutoff value at 95% confidence level at 1 df). We CANNOT REJECT the null hypothesis and cannot conclude there is a statistically significant relationship between gender and whether or not a person works for the government. > 0.05 Chi-Square Tests .032b 1 .857 .003 1 .959 .032 1 .857 .898 .480 264 Pearson Chi-Square Continuity Correctiona Likelihood Ratio Fisher's Exact Test N of Valid Cases Value df Asymp. Sig. (2-sided) Exact Sig. (2-sided) Exact Sig. (1-sided) Computed only for a 2x2 tablea. 0 cells (.0%) have expected count less than 5. The minimum expected count is 45. 70. b. < 3.84 > 0.05
  38. 38. OUTPUT: Analyze/Descriptive Statistics/Frequencies Missing Values Employer 8 2.9 2.9 2.9 12 4.3 4.3 7.2 13 4.7 4.7 11.9 45 16.2 16.2 28.2 17 6.1 6.1 34.3 3 1.1 1.1 35.4 5 1.8 1.8 37.2 5 1.8 1.8 39.0 20 7.2 7.2 46.2 11 4.0 4.0 50.2 47 16.9 17.0 67.1 51 18.3 18.4 85.6 4 1.4 1.4 87.0 25 9.0 9.0 96.0 11 4.0 4.0 100.0 277 99.6 100.0 1 .4 278 100.0 Gov: Federal Gov: State Gov: County Gov: City Gov: Special Agency Gov: Non U.S. Private: Single Person Private: 2-4 Persons Private: 5-19 Persons Private: 20-49 Persons Private: >= 50 Persons Non-Profit (U.S.) International Org. Educational Inst. Other Total Valid SystemMissing Total Frequency Percent Valid Percent Cumulative Percent 7-11. Private
  39. 39. 3-2. Contingency Table/ Chi-Square Analysis How about analyzing the difference in the proportion of males and females in the private sector by a contingency table with PRIVATE and GENDER. H0: There is NO relationship between employment sector and gender.
  40. 40. • Create a new variable PRIVATE that – has the value 1 if the EMPLOYER (7-11) indicates the alumnus works for a government organization. – has the value 0 if the EMPLOYER is not 7-11 (else). Method 2. Use Transform/Recode/Into Different Variables to create a new categorical variable PRIVATE. Step1: Create a new variable (PRIVATE)
  41. 41. Analyze/Descriptive statistics/Crosstabs Counts: Observed Percentages: Row Column Select “private” for “Row” & “Gender” for “column.”
  42. 42. Contingency table Analyze/Descriptive statistics/Crosstabs Private Sector Job * Gender Crosstabulation 92 87 179 51.4% 48.6% 100.0% 72.4% 63.5% 67.8% 35 50 85 41.2% 58.8% 100.0% 27.6% 36.5% 32.2% 127 137 264 48.1% 51.9% 100.0% 100.0% 100.0% 100.0% Count % within Private Sector Job % within Gender Count % within Private Sector Job % within Gender Count % within Private Sector Job % within Gender .00 1.00 Private Sector Job Total Female Male Gender Total
  43. 43. Contingency table Analyze/Descriptive statistics/Crosstabs Chi-Square value = 2.411 < 3.84 (1.962 ). We CANNOT REJECT the null hypothesis and cannot conclude that the difference in the proportion of males and females in the private sector is statistically significant. Chi-Square Tests 2.411b 1 .120 2.019 1 .155 2.422 1 .120 .147 .077 264 Pearson Chi-Square Continuity Correctiona Likelihood Ratio Fisher's Exact Test N of Valid Cases Value df Asymp. Sig. (2-sided) Exact Sig. (2-sided) Exact Sig. (1-sided) Computed only for a 2x2 tablea. 0 cells (.0%) have expected count less than 5. The minimum expected count is 40. 89. b. < 3.84 > 0.05
  44. 44. The degrees of freedom in the chi-square test of a contingency table: d.o.f = (r-1)*(c-1) where r & c are the number of rows and columns (or the number of categories of two variables) in a table. The number of d.o.f is the number of comparisons between actual and expected frequencies minus the number of restrictions imposed on these frequencies. Since the number of cells in a contingency tables is r*c, there are r*c actual frequencies to be compared with the corresponding expected frequencies. Because the sum (total) of the frequencies in each row and each column are given, there are r+c-1 restrictions. Therefore, the number of d.o.f is: r*c - (r+c-1) = (r-1)*(c-1). The degrees of freedom in the chi-square test
  45. 45. • What other factors may influence income? • Control for job sector (government, private, non-profit), and examine a difference in average income between males and females within each sector. – Select cases: Data/Select Cases if STATUS =1 & INCOME >20000 & INCOME > 400000 & GOV = 1 if STATUS =1 & INCOME >20000 & INCOME > 400000 & PRIVATE = 1 – Compare means/Independent Sample T-test • If we see differences within each sector, other factors besides job sector are influencing income. Extensions to the Analysis

×