General Approach to Hypothesis Testing Malhotra et al. 2005, 293
What is a hypothesis? • A hypothesis is an unproven proposition or supposition that tentatively explains certain facts or phenomena. – Empirically testable • Null hypothesis is a statement about status quo. – Any change from what has been thought to be true will be due entirely to random sampling error. • Alternative hypothesis is a statement indicating the opposite of the null.3
Null and alternative hypotheses • Example: highly dogmatic consumers will be less likely to try a new product than less dogmatic consumers. • The null hypothesis (H0)is where there is no difference between high dogmatics and low dogmatics in their willingness to try an innovation. • The alternative hypothesis (H1) is where there is a difference between high and low dogmatics.4
Hypothesis testing• The purpose of hypothesis testing is to determine which of the two hypotheses is correct. 5
The hypothesis–testing procedure • The process is as follows: – Determine a statistical hypothesis. – Imagine what sampling distribution would be if this is a true statement. – Take an actual sample and calculate sample mean. – Determine if the deviation between the obtained value and expected value of sample mean could have occurred by chance alone. – Set a standard for determining if we reject the null or accept the alternative.6
The hypothesis–testing procedure • Significance level is the critical probability in choosing between the null and alternative hypotheses. – The probability level that is too low to warrant support of the null hypothesis. • Assuming the null is true, if the probability of occurrence of the observed data is smaller than the significance level, then the data suggest that the null should be rejected. – Evidence supporting contradiction of null.7
Type I and type II errors • Since hypothesis testing is based on probability theory, the researcher cannot be completely certain and runs the risk of committing two types of errors. – Type I error is an error caused by rejecting the null hypothesis when it is true. • Probability of alpha (α) – Type II error is an error caused by failing to reject the null hypothesis when the alternative hypothesis is true. • Probability of beta (β)8
The t–distribution • The t-distribution is a symmetrical, bell– shaped distribution that is contingent on sample size. – It has a mean of zero and a standard deviation equal to one. • The shape of the t–distribution is influenced by its degrees of freedom. – Number of observations minus number of constraints or assumptions. – For sample sizes over 30, t–distribution closely approximates Z–distribution.9
Univariate hypothesis test using the t– distribution • Store manager believes that average number of customers who return or exchange merchandise is 20. – H0: µ = 20 and H1: µ ≠ 20 • Store records returns and exchanges for 25 days (n) and sample mean is 22 and standard deviation is 5. • Confidence level of 95% or significance level of 5%. • Referring to Table B.3 in Appendix B, we find that for 24 degrees of freedom (n–1), the t–value is 2.064. Thus: Lower limit = µ - tc.l. S = 20 – (2.064)5 = 17.936 √n √25 Upper limit = µ + tc.l. S = 20 + (2.064)5 = 22.064 √n √25 • Since the sample mean lies within the critical limits, the null hypothesis cannot be rejected.11
Univariate hypothesis test using the t– distribution• Alternatively, we may test a hypothesis by calculating the observed t– value and comparing it to the critical t–value.• To calculate the observed t: tobs = X - µ = 22 – 20 = 2 Sx 5 √25• Referring to Table B.3 in Appendix B, we find that for 24 degrees of freedom (n–1), the t–value is 2.064.• Since the observed t–value is less than the critical t–value, the sample mean lies within the critical limits.• Thus, the null hypothesis cannot be rejected.12
The Chi-square test for goodness of fit • The Chi-square (c2) test allows for investigation of statistical significance in the analysis of a frequency distribution. • Categorical data on variables like sex, education, etc., • Allows us to compare the observed frequencies with the expected frequencies based on theoretical ideas about the population distribution. • Tests whether the data came from a certain probability distribution.13
The Chi-square test for goodness of fit • The process is as follows: • Formulate the null hypothesis and determine the expected frequency of each answer. • Determine the appropriate significance level. • Calculate the c2 value, using the observed frequencies from the sample and the expected frequencies. • Make the statistical decision by comparing the calculated c2 value with the critical c2 value.14
The Chi-square test for goodness of fit15
The Chi-square test for goodness of fit • To calculate the Chi-square statistic: χ2 = Σ (Oi – Ei) 2 Ei – where c2 is the Chi–square statistic, Oi is the observed frequency in the ith cell, and Ei is the expected frequency in the ith cell. • Table 12.15 shows that calculated Chi–square value is 4. • The degrees of freedom is the number of cells associated with column or row data minus one (k–1). – k is 2 since there are only two categorical responses. • Referring to Table B.4 in Appendix B, we find that for 1 degree of freedom (k–1), the Chi-square value is 3.84. • Since the calculated value is larger than the critical value, the null16 hypothesis is rejected.
Choosing the appropriate statistical technique • The choice of statistical analysis depends on: – The type of question to be answered • Example: researcher concerned with comparing average monthly sales central value would use t–test. – The number of variables • Example: researcher concerned with one variable at a time would use univariate statistical analysis. – The scale of measurement • Example: testing a hypothesis about a mean requires interval scaled or ratio scaled data. • Parametric versus nonparametric statistics.17
Parametric versus hypothesis tests • Parametric statistics are used when the data are interval or ratio scaled, when the sample size is large, and when the data are drawn from a population with a normal distribution. • Nonparametric statistics are used when data are either nominal or ordinal.18
Using the Normal tables
Using the p-value One-Sample Statistics Std. Error N Mean Std. Deviation Mean How much have you spent, in total, on 448 1150.5960 2705.08330 127.80317 Internet shopping over the past 12 months? One-Sample Test Test Value = 800 95% Confidence Interval of the Mean Difference t df Sig. (2-tailed) Difference Lower Upper How much have you spent, in total, on 2.743 447 .006 350.5960 99.4263 601.7657 Internet shopping over the past 12 months? H 0 : µ ≤ 800 If p-value ≤ 0.05, Reject H0 H 1 : µ > 800Conclude that the average amount spent on the internet is more than $800 per year
Types of Hypothesis Tests
Parametric TestsOne sample t test We are testing the hypothesis that the mean satisfaction rating exceeds 4.0, the neutral value on a 7-point scale. H 0 : µ ≤4 H1 : µ > 4
Parametric Tests cont. One-Sample Statistics Std. Error N Mean Std. Deviation Mean Shopping at this website is usually a 443 5.19 1.079 .051 satisfying experience One-Sample Test Test Value = 4 95% Confidence Interval of the Mean Difference t df Sig. (2-tailed) Difference Lower UpperShopping at thiswebsite is usually a 23.112 442 .000 1.19 1.08 1.29satisfying experienceThe p-value < 0.05, hence reject H0 and conclude that the satisfaction rating for the website is greater than 4 (generally agree)
Parametric Tests cont.Two Independent samples (Means) We are testing the hypothesis that mean amount spent on shopping on the Internet is different for males and females H0: µ 1 = µ 2 H1: µ 1 ≠ µ 2 Group Statistics Std. Error Gender N Mean Std. Deviation Mean How much have you Male 236 1283.4237 3502.02542 227.96244 spent, in total, on Internet shopping over Female the past 12 months? 212 1002.7311 1342.04673 92.17215
Parametric Tests cont. Independent Samples Test Levenes Test for Equality of Variances t-test for Equality of Means 95% Confidence Interval of the Mean Std. Error Difference F Sig. t df Sig. (2-tailed) Difference Difference Lower UpperHow much have you Equal variances 2.166 .142 1.097 446 .273 280.6926 255.91581 -222.258 783.64323spent, in total, on assumedInternet shopping over Equal variancesthe past 12 months? 1.142 308.922 .255 280.6926 245.89139 -203.141 764.52641 not assumed Since p-value > 0.05, t test assuming equal variances should be used Since p-value > 0.05, we do not reject H0 and conclude that there is no difference between men and women on the amount they spend on internet shopping
Parametric Tests cont. Two Independent samples (Proportions) We are testing the hypothesis that the proportion of heavy internet users is the same for male and females. H:π =π 0 1 2 H1: π 1 ≠ π 2 Internet usage * Gender Crosstabula tion Count Gender Male Female Total Internet Light 39 58 97Sample usage Heavy 199 154 353data Total 238 212 450
Parametric Tests cont.If Zcrit = 1.645 (using the normal tables where α =0.05) We reject H0 and conclude that there is a difference in the percentage (proportion) of heavy user of the internet between males and females.
Non-Parametric TestsChi-squareH0: There is no association between Internet usage and age of respondentsH1: There is an association between Internet usage and age of respondents
Non-Parametric Tests cont. Internet usage * Age of respondents CrosstabulationCount Age of respondents 60 years 18 - 24 25 - 39 40-59 or over TotalInternet Light 22 17 44 14 97usage Heavy 164 107 71 11 353Total 186 124 115 25 450 Chi-Square Tests Asymp. Sig. P-value < 0.05 Value df (2-sided) hence reject H0Pearson Chi-Square 51.444a 3 .000 and concludeLikelihood Ratio 47.450 3 .000 that there is anLinear-by-Linear association 43.858 1 .000 between internetAssociationN of Valid Cases 450 usage and age a. 0 cells (.0%) have expected count less than 5. The of respondents minimum expected count is 5.39.
Non-Parametric Tests cont. Chi-square (Another one!) VU’s Open Day organisers are investigating whether visitors’ overall rating of Open Day is independent of the age of the visitor. Test at .05 level of significance.H0: Overall rating of Open Day and age are independent [no association]H1: Overall rating of Open Day and age are not independent [association]
Non-Parametric Tests cont. Overall rating of Open Day * Age of respondent CrosstabulationCount Age of respondent 18 or under 19 - 29 Over 29 TotalOverall Poor 1 2 3rating of Fair 6 2 1 9Open Good 24 15 4 43Day Very Good 100 25 12 137 Excellent 66 29 18 113Total 197 71 37 305 Chi-Square Tests P-value < 0.05 hence reject H0 Asymp. Sig. and conclude Value df (2-sided)Pearson Chi-Square 18.369a 8 .019 that there is anLikelihood Ratio 15.161 8 .056 associationLinear-by-Linear between ratings .023 1 .879Association of open day andN of Valid Cases 305 age of a. 5 cells (33.3%) have expected count less than 5. The respondents minimum expected count is .36.
Chapter 13 Bivariate statistical analysis: Tests of differences33
What is the appropriate test of difference?• Do two groups differ with respect to some behaviour, characteristic, or attitude? – For example, are there differences in muscle gain between subjects in the experimental exposed to muscle supplements and control group? – Two different variables.• Comparisons between two independent groups is independent samples t–test for difference of means.34
The independent samples t–test for differences of means • Used to test a hypothesis stating that the means scores on a variable will be significantly different from two independent samples. • Can only take the mean of an interval or ratio–scaled variable. – The other variable is nominal scaled with two groups. • Example: frequency of purchase between gender groups.35
The independent samples t–test for differences of means • The null hypothesis about differences between groups is as follows: m1 = m2 or m1 – m2 = 0 • The t–value is a ratio with information about the difference between means (provided by the sample) in the numerator and the standard error in the denominator. t = Mean 1 – Mean 2 = X1 – X2 Variability of random means Sx1-x2 where Sx1-x2 is the pooled estimate of the standard error.36
The independent samples t–test for differences of means • From Table 13.1, we can calculate the pooled estimate of the standard error (1.304) and the t– statistic (2.341). • Referring to Table B.3 in Appendix B, the critical t– value is 2.021. • Since the calculated t–value exceeds the critical t– value, the null is rejected.37
Conducting an independent samples t-test in SPSS • Running an independent samples t–test in SPSS would produce the results shown below.38
Conducting an independent samples t-test in SPSS • Mean is displayed in Table 13.2. – But is this a real difference or did it occur by chance? • Equality of variances is displayed in Table 13.3. – If significance value is more than 0.05, then we can assume equal variances. • Equality of means and significance value are displayed in Table 13.3. – If significance value is less than 0.05, then the two means are significantly different.39
Analysis of variance (ANOVA) • Used when comparing means of more than two groups or populations. – For example, comparing women working full–time outside home, part–time outside home, and full time inside home on willingness to purchase life insurance. • Using variances to allow us to compare means. • Null hypothesis about differences between groups is as follows: m1 = m2 = m340
Analysis of variance (ANOVA) • If the grouping variable (i.e., working status) is responsible for differences in purchase intention for life insurance, then the variation in responses between each of the three groups will be comparatively larger than the variation in responses within each of the groups. • The F–test is used to compare one sample variance with another. – F–ratio: the larger sample variance is divided by the smaller sample variance.41
Analysis of variance (ANOVA) • A larger ratio of variance between groups to variance within groups implies a greater F–ratio. • If the F–ratio is large, the more likely the differences in means has occurred as a result of the grouping variable. • A calculated F–ratio that exceeds the critical F–ratio indicates that results are statistically significant. • Thus, the null has to be rejected.42
Conducting an ANOVA in SPSS • Running an ANOVA in SPSS would produce the results shown below.43
Conducting an ANOVA in SPSS • Mean is displayed in Table 13.7. – But is this a real difference or did it occur by chance? • Variances between and within groups are displayed in Table 13.8. • F–ratio and significance value are displayed in Table 13.8. – If significance value is more than 0.05, then the means are not significantly different. • A result of sampling error or chance.44
Nonparametric statistics for tests of differences• So far, it has been necessary to assume that population is normally distributed. – If it is normal, the error associated with making inferences can be estimated. – If it is not normal, the error may be large and cannot be estimated.• Nonparametric tests overcome these limitations but will have a greater probability of Type II error and require a larger sample size compared to a parametric test.45
Statistical and practical significance for test of differences• There is a distinction between statistical significance and practical significance.• Even if there is a statistical difference, the practical difference might be very small. – This is substantive significance.• Researchers need to keep this in mind when interpreting statistical output.46
ANOVA• Analysis of variance (ANOVA) examines the differences in the mean values of the dependent variable (interval scale) associated with the effect of the controlled independent variable (nominal scale), after taking into account the influence of the uncontrolled independent variables. e.g. Do the brand evaluation of groups exposed to different commercials vary? How do consumers’ intentions to buy the brand vary with different price levels?
One-way ANOVA We are testing to determine the effect of in-store promotion (X) on sales (Y). H0: µ1 = µ2 = µ3 H1: µ1 ≠ µ2 ≠ µ3 DescriptivesSales 95% Confidence Interval for Mean N Mean Std. Deviation Std. Error Lower Bound Upper Bound Minimum Maximumhigh 10 8.3000 1.33749 .42295 7.3432 9.2568 6.00 10.00medium 10 6.2000 1.75119 .55377 4.9473 7.4527 4.00 9.00low 10 3.7000 2.00278 .63333 2.2673 5.1327 1.00 7.00Total 30 6.0667 2.53164 .46221 5.1213 7.0120 1.00 10.00
One-way ANOVA cont.Main effect (in-store promotion) ANOVA Sales Reject Sum of Squares df Mean Square F Sig. H0, the Between Groups 106.067 2 53.033 17.944 .000 means Within Groups 79.800 27 2.956 are not Total 185.867 29 equal Multiple ComparisonsResiduals Dependent Variable: Sales Tukey HSD Mean Difference 95% Confidence Interval (I) In-store promotion (J) In-store promotion (I-J) Std. Error Sig. Lower Bound Upper Bound high medium 2.1000* .76884 .029 .1937 4.0063 low 4.6000* .76884 .000 2.6937 6.5063 medium high -2.1000* .76884 .029 -4.0063 -.1937 low 2.5000* .76884 .008 .5937 4.4063 low high -4.6000* .76884 .000 -6.5063 -2.6937 medium -2.5000* .76884 .008 -4.4063 -.5937 *. The mean difference is significant at the .05 level.
One-way ANOVA cont.Interpretation• 57.1% (ie. 2 = 106.067/ 185.856) of the variation in sales is accounted for by in-store promotion, indicating a modest effect.• The mean sales figures are different, that is at least one pair of means is statistically different.• All combination of means are statistically different, therefore the different levels of in- store promotion will impact sales.