Lesson 2 Chi squared


Published on

Published in: Education, Technology
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Lesson 2 Chi squared

  1. 1.  Theory › Introduction › Worked Example › Application Past paper questions Further application & review
  2. 2. Content Detail
  3. 3.  Hypothesis testing involves making a conjecture (assumption) about some facet of our world, collecting data from a sample, performing a specific mathematical test, and then deciding whether or not the conjecture is true. A conjecture must be stated in two parts: › The null hypothesis (H0) – states that there is no significant difference between the two parameters being tested (they are “not related to” each other, i.e. independent) › The alternative hypothesis (H1) states that this is a significant difference. (they are “related” in some way, i.e dependent) The only hypothesis test covered by the Studies SL course is the Chi Squared test.
  4. 4.  The Chi-square test itself is quite straight forward, your GDC can do it in two steps but you also must know the formula and be able to do it by hand The hypothesis test which uses Chi-square determines whether or not two variables are related. It follows a general pattern: (1) Make a conjecture (2) Write the null hypothesis using “is not related to, or “independent”; and write the alternative hypothesis using is related to or “dependent” (3) Calculate the chi-square test (4) Determine reference values (5) Compare the two and either accept or reject the null hypothesis
  5. 5.  You can find chi-squared on your GDC by using the statistics mode  Press F3 {Test}  Press F3 again for {Chi}  Note : You must have entered the data in to Matrix A from Matrix mode first!!
  6. 6.  A researcher conjectures Seat belt usage that seat belt usage, for drivers, is related to gender. Gender Yes No She gathers data by recording seat belt usage at several randomly selected Female 50 25 intersections. The data has been recorded in the table as shown. Male 40 45 Construct a chi-square hypothesis test to determine if there is enough evidence This type of table is to support the researcher’s called a conjecture. contingency table. (It’s what you put in matrix A on the GDC)
  7. 7. Exam hint – you must also be able to do a contingency table by hand. The largest to be tested will be 4 x 4. Since the conjecture has Seat belt usage already been made we can start at step (2) – write the null and alternative hypotheses Gender Yes No › H0 – Seat belt usage is not related to gender › H1 – Seat belt usage is related to Female 50 25 gender Step (3) – calculate chi-square › Enter the data into Matrix A, from the Male 40 45 Run screen press F1 {MAT} › Enter the dimensions for matrix A which are 2 x 2, then press EXE › Enter the contigency table data in to the matrix, press EXIT twice to go back to the RUN screen X 2 Test › In STAT mode, press F3 then F3 again X 2 = 6.22471211 › Highlight EXECUTE and press EXE p = 0.01259793 df = 1
  8. 8.  Step (4) – determine reference values Exam hint – the only › There are two reference values of significance levels that importance, the p-value (which was will be tested are 1%, 5% calculated during the chi-square test) and and 10%. the Critical Value which you read off the CV distribution table on your IB formula sheet. › In this case assume α=0.01 (1%) Step (5) – make a comparison between either › p-value and the significance level If p-value > α level, OR then we can › Chi-square test and the Critical Value accept H0 Hence p-value > alpha level, since I.e there is not 0.0126 > 0.01 enough evidence In other words , we accept the null to reject it hypothesis that there is no relationship between seat belt usage and gender.
  9. 9.  From what Lauren observed, Hours exercised per she believes that the week number of hours exercised Male 5 10 12 per week is dependent on gender. She collected data randomly and organised the Female 9 8 4 results in the table shown. Determine whether there is enough evidence to accept or reject the null hypothesis: › a) for α=0.01 › b) for α=0.05 › c) for α=0.10
  10. 10.  Write the null and alternative Hours exercised per week hypotheses › H0 – The number of hours exercised each week independent on gender Male 5 10 12 › H1 – The number of hours exercised each week is dependent on gender Female 9 8 4 Calculate chi-square and the p- value X 2 Test X 2 = 4.69 (3sf) • Compare p-value to each p = 0.0959 (3sf) signficance level df = 2 a) 0.09>0.01, hence accept null hypothesis b) 0.09>0.05, hence acceptWhilst it is not technically correct null hypothesis to say “accept H0” it is still c) 0.09<0.10, hence we reject accepted in the IB. the null hypothesis
  11. 11.  This formula is on the IB formula sheet c 2 =å ( fo - fe ) 2 Exam hint – you are expected to be able to calculate the X 2 test statistic calc fe with your GDC when raw data is given. You should › fo is the observed frequencies also be able to perfrom an entire X 2 hypothesis test (i.e the raw data) without your GDC. › fe is the expected frequencies * Don’t forget you can check your expected values using Matrix B. It is easiest to perform this sum calculation using a table one step at a time.
  12. 12.  Remember these steps below can be checked using your GDC, especially the expected values, using Matrix B.  Completing a hypothesis test which uses Chi-square by hand follows a similar process to the previous one except some of the steps are much longer: (1) Make a conjecture (same as before) (2) Write the null hypothesis using “is not related to, or “independent”;Steps 3& 4 are and write the alternative hypothesis using is related to much or “dependent” (same as before)longer! (3) Calculate the chi-square test statistic (X 2) (4) Determine reference value called the Critical Value (CV) (5) Compare the two and either accept or reject the null hypothesis
  13. 13.  This step has as series of sub-parts: (A) Expand the contingency table to have both row totals, column totals and an overall total. The raw data in the table are called “observed values”. (B) Calculate the “expected values” for each cell in the table based on the probabilities using the totals of each row and column. (C) Organise the frequencies in to a new table to calculate X 2. Using the Example 1 from before, below is part A shown. Seat belt usage Gender Yes No Row total Female 50 25 75 Male 40 45 85 Col total 90 70 160
  14. 14.  Using the Example 1 from before, below part B is shown. Seat belt usage Gender Yes No Row total These values are Female 50 25 75 the observed frequencies (fo) Male 40 45 85 Col total 90 70 160 To calculate the expected frequencies (fe) in each cell we use the formula [Col total] x [ Row total] / [Total sum] Expected frequencies (fe) Female 90*75/160 = 42.1875 70*75/160 = 32.81235 Remember to check these with Male 90*85/160 = 47.8125 70*85/160 = 37.1875 Matrix B
  15. 15. Don’t round to 3sf during these calculations! Using the Example 1 from before, below part C is shown. fo fe fo-fe (fo-fe)2 (fo-fe)2 fe 50 42.1875 7.8125 61.035 1.4468 25 32.81235 -7.8125 61.035 1.8601 40 47.8125 -7.8125 61.035 1.2765 45 37.1875 7.8125 61.035 1.6413 Sum = ( fo - fe ) 2 6.22 (3sf) c 2 calc =å = 6.22 fe
  16. 16.  If you had a Million Dollars and you gave $1 away, how much would you say you had? (When does it become significant?) A critical value is a number which represents the boundary in determining whether a statistic is significant or not. That is it separates the choice to accept or reject the null hypothesis. If the chi-square test value falls below (less than) of the CV then we accept the null hypothesis (H0) If the chi squared test value falls to the right (greater than) of the CV then we reject the null hypothesis (H0) The critical value is found using the distribution table on the IB formula sheet. › Left side column represents degrees of freedom (df) = (#cols-1)*(#rows-1) › Top row has alpha values and p-values, since the chi-squared test is a right tail test we will always use the five right columns, and since the iB only uses significance levels of 0.01, 0.05 and 0.1 we will only ever need the corresponding p-values 0.99, 0.95, 0.9 For our example, p=0.99, df=1  hence CV = 6.635
  17. 17.  X 2 = 6.22 CV = 6.635 Hence X 2 < CV and we will accept the null hypothesis that there is no relationship between seat belt usage and gender.Remember : can be done in two ways Step (5) – make a comparison between either Can you spot the › p-value and the significance level difference? OR › Chi-square test and the Critical Value Previously we found p-value > alpha level, since 0.0126 > 0.01 and we accepted the null hypothesis.
  18. 18.  If you are comparing p-value with α-level then if: › p > α  accept the null hypothesis › p < α  reject the null hypothesis If you are comparing X 2 with CV then if: › X 2 < CV  accept the null hypothesis › X 2 > CV  reject the null hypothesis
  19. 19.  H&H 2nd Ed – Exercise 20E.1 a-d (p615) Worked example 8 (p618) Exercise 20E.2 Exercise 20E.3 (p621)