2. Purpose of the Chi Squared Test The purpose of the chi squared test is to see whether observed experimental data is a ‘good fit’ with theoretical expected results. The χ 2 statistic is calculated in the following way:
4. Notes on the Chi Squared Distribution 1.) ν (the number of degrees of freedom) is calculated from the number of classes – the number of restrictions. 2.) A restriction is defined as any value that is derived from the observed data set. 3.) The chi squared distribution is continuous and thus offers poor approximation when dealing with small frequencies. When calculating χ 2 we have to combine any classes which contain expected frequencies of less than 5 elements. 4.) As with most statistical distributions, we do not need to concern ourselves with calculating by hand as all critical values are tabulated for easy reference.
5. Lesson 1 - Example Question I The table below shows the results when a die is rolled 120 times. Conduct a chi squared test to see whether the die is fair or not at the 5% significance level. 24 20 18 14 29 15 Freq 6 5 4 3 2 1 Score
6. Lesson 1 - Example Question II The table below shows the results of an experiment in which four coins thrown 160 times and the number of heads recorded. a.) Fit a Bin (4, ½) distribution to the data. b.) Test the goodness of fit of the Bin (4, ½) model using a chi squared test at the 5% significance level. 10 35 54 46 15 Freq 4 3 2 1 0 Score
7. Practice Questions Statistics 3 and 4 by Jane Miller Page 121, Exercise 5A Questions 1 and 3
8.
9.
10. Lesson 3 - Example Question The table below shows the number of calls arriving at a switchboard in time intervals of 5 minutes. Test at the 5% significance level whether the Poisson distribution provides a good model for this data. 0 2 4 23 71 Freq 4 or more 3 2 1 0 No
11. Practice Questions Statistics 3 and 4 by Jane Miller Page 121, Exercise 5A Questions 11 (part a only) (Poisson) 2 (Ratio) 12 (Geometric) 13 (Poisson)
12.
13. Lesson 4 - Example Question The height in centimetres gained by a conifer in its first year after planting is denoted by the random variable H. The value of H is measured for a random sample of 86 conifers and the results obtained are summarised in the table below. Assuming that H is modelled by a N(50, 15 2 ) distribution, test at the 5% level, the goodness of fit of the model. 12 18 28 18 10 Obs Freq >65 55-65 45-55 35-45 <35 H
14. Practice Questions Statistics 3 and 4 by Jane Miller Page 128, Exercise 5B Question 1 onwards
15.
16. Degrees of Freedom The number of degrees of freedom in a h x k contingency table is given by ν = (h – 1) x (k – 1).
17. Lesson 5 - Example Question I Is income level independent of method of transport? 693 129 102 462 Total 266 29 32 205 Large 312 64 49 199 Average 115 36 21 58 Small Income Level Total Self Public Car Method of Transport
18. Lesson 5 - Example Question II A university sociology department believes that students with a good grade in A Level General Studies tend to do well on sociology degree courses. To check this it has collected information on a random sample of 100 who had just graduated and who also had taken general studies at A Level. The students performance in General Studies was divided into two categories, those with grades A or B and ‘others’. Their degrees were recorded as Class I, II, III or fail. The data is given in the table below. Test at the 1% level, the hypothesis that degree performance is independent of A level performance in General Studies. 100 5 30 50 15 Total 60 4 24 28 4 Others 40 1 6 22 11 Grade A or B Total Fail Class III Class II Class I
19. Yates Correction χ 2 is a continuous distribution whilst χ 2 calc is not. In the case of a 2 x 2 contingency table for which ν = 1, the agreement between the two distributions can be improved by applying a continuity correction called Yates Correction. This involves reducing each value of |O – E| by 0.5
20. Lesson 5 - Example Question III A random sample of 930 companies quoted on the stock exchange revealed the information summarised in the table below, which shows the distribution of these companies classified according to two attributes. In this table, D indicates that the company has diversified its product range during the previous financial year, and P indicates that there has been a significant rise in profits during the previous financial year. The null hypothesis is that D and P are independent. Show that this can rejected at all reasonable significance levels. 377 299 Not P 106 148 P Not D D