Upcoming SlideShare
×

# Statistics lecture 10(ch10)

720 views

Published on

X squared Tests

Published in: Education
0 Likes
Statistics
Notes
• Full Name
Comment goes here.

Are you sure you want to Yes No
• Be the first to comment

• Be the first to like this

Views
Total views
720
On SlideShare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
26
0
Likes
0
Embeds 0
No embeds

No notes for slide

### Statistics lecture 10(ch10)

1. 1. 1
2. 2. OBJECTIVES• Recognise a suitable distribution to apply chi square test to• Conduct the goodness-of-fit test of hpothesis• Conduct the test of independence• Conduct a test of homgeneity 2
3. 3. Chi square distribution• Positively skewed• Test done on right tail only• Therefore all chi square tests are positive with one critical value only• Basic steps of hypothesis test are the same, only the test statistic and distribution have changed 3
4. 4. • Techniques used to analyse data up to now was measured on quantitative scale.• Results of tests can often be classified into categories where there is no natural order: – Categorical variable – Categories – Categorical data• Categorical data can be analysed with Chi-squared tests: – Simple random sample – Sample size reasonable large 4
5. 5. Example:• Survey of job satisfaction• Employed persons classified as satisfied, neutral, dissatisfiedCATEGORICAL VARIABLE – is employee satisfactionCATEGORIES – satisfied, neutral, dissatisfiedCATEGORICAL DATA – no. of employees satisfied,neutral or dissatisfied (also referred to as frequency ofcategory) 5
6. 6. Examples1. A persons income can be categorised as high, medium or low. Define the categorical variable, the categories and the categorical data2. We want to investigate different types of industries, e.g. information technology, financial and transformation. Define the categorical variable, the categories and the categorical data 6
7. 7. Example answers1. Categorical variable is income. Categories are high, medium and low. Categorical data are the no. of people who have high, medium or low income2. Categorical variable is type of industry, categories are information technology, financial and transformation. Categorical data are the no of industries that are information tech, financial or transformation 7
8. 8. • Chi-squared goodness-of-fit test – This test describes a single population of categorical data. – The multinomial experiment studied is an extension of the binomial experiment. • There are n independent trials. • The outcome of each trial can be classified into one of k categories. • The probability pi of cell i remains constant for each trial. Moreover, p1 + p2 + … +pk = 1. – Experiment records the observed trails for each category. – Denoted by f1, f2, …, fk and f1 + f2 + … + fk = n 8
9. 9. EXAMPLEIn a box of smarties you will find 6 different colours:brown, red,yellow,blue,orange and green. Arandom sample of smarties (6918 in total) wastaken and the frequesncy of each colour wascounted. The distribution of colours is given below Colour Brown Red Yellow Blue Orange Green f 1611 1172 1308 904 921 1002Determine whether the smarties survey fits thedescription of a multinomial experiment 9
10. 10. EXAMPLEAnswer:See example 10.1, p350, textbook 10
11. 11. To use the Χ2-tests• The goodness-of-fit test all expected frequencies must – Used to determine if the observed counts ofbe at least 5 the categories agree with the probabilities specified for each category. – Observed frequencies (f ) compared with the expected frequencies (e). Testing H0: Proportions agree with specified probabilities Alternative Decision rule: Test statistic hypothesis Reject H0 if … ( fi  ei )2 H1: H0 is not true Χ2 > Χ2k – 1;1 – α 2   ei 11
12. 12. • Example – A household detergent is marketed in three sizes: • 1 000 ml, 750 ml and 250 ml – The distributers belief that the market share of the different sizes is as follow: • 1 000 ml = 40% • 750 ml = 45% • 250 ml = 15%. – To study the effect of the economic climate on the sales of the products, 200 customers were ask to state which size they will prefer. • Survey results: – 82 customers preferred the 1 000 ml – 102 customers preferred the 750 ml – 16 customers preferred the 250 ml 12
13. 13. • Solution – The population investigated is the size preferences. – The data are in categories. – This is a multinomial experiment (three categories). – The question of interest: Are p1, p2, and p3 different from the expected 40%, 45% and 15%? 13
14. 14. • The hypotheses are: Expected frequencies are – H0: p1 = 0,40, p2 = 0,45, p3 = 0,15 all ≥ 5 – H1: At least one pi is not equal to its specified value. Are the observed and the expected frequencies the same? Expected Expected frequencies Observed values frequencies 16 15% ei = npi 40% 1000ml 82 1000ml 40% of 200 = 80 750ml 750ml 250ml 250ml 45% of 200 = 90 10245% 15% of 200 = 30 14
15. 15. • The hypotheses are: – H0: p1 = 0,40, p2 = 0,45, p3 = 0,15 – H1: At least one pi is not equal to its specified value. Are the observed and the expected frequencies the same? Expected frequencies Expected frequencies Observed values 30 15% 16 ei = npi 40% of 200 = 80 80 40% 1000ml 82 1000ml 750ml 750ml 45% of 200 = 90 250ml 250ml 15% of 200 = 30 1029045% 15
16. 16. • The hypotheses are: – H0: p1 = 0,40, p2 = 0,45, p3 = 0,15 – H1: At least one pi is not equal to its specified value. ( fi  ei ) 2 2   α = 0,05 ei 0 5,9917 Χ2k – 1;1 – α (82  80) (102  90) (16  30) 2 2 2    80 90 30 Accept H0 Reject H0  8,18 Conclusion: At 5% significance level there is – Reject H0. sufficient evidence to reject the null hypothesis. At least one of the probabilities pi is different. Thus, at least two market shares have changed. 16
17. 17. Two friends were playing a board game in which a die played abig role. One of the players believed that the die was not fair.60 tosses of the die produced the results below. Test at 5%significance level whether the die was fair.Number of dots 1 2 3 4 5 6Number of tosses 7 6 7 18 15 7 17
18. 18. ei = npi = 60(1/6) = 10 Expected values for the six categories are: 10 10 10 10 10 10 H0: p1 = … = p6 = 1/6 H1: At least one pi ≠ 1/6  = 0,05  f  e 2  = 2  e 7 10 + … + 2 (7 10) 2 = 10 10  = 13,2 Accept H0 Reject H0 2  k 1;1   2  =  5; 0,95 = 11,07 Therefore, reject H0. The probabilities of the dots are not equal and the die was not  fair. 18
19. 19. • Chi-squared test for independence – Cross classify two categories using a contingency table. – Rows representing one category and columns representing the other category. – Each value in cell indicates the frequency in the cross classification. – Table can be any number of rows and columns: • r×c number of cells 19
20. 20. CONCEPT QUESTIONS• Questions 1 – 3 , p356 20
21. 21. • Chi-squared test for independence – H0: the two variables are independent – no relationship. – H1: the two variables are dependent – is a relationship. For a 2×2 contingency table: B A B1 B2 Total A1 f11 f12 r1 Observed A2 f21 f22 r2 frequenciesTotal c1 c2 n 21
22. 22. • Chi-squared test for independence – Contingency tables describe the relationship between two categorical variables. – H0: the two variables are independent – no relationship. – H1: the two variables are dependent – is a relationship.For a 2×2 contingency table: For each observed B frequency an expected frequency must be A B1 B2 Total calculated A1 f11 f12 r1 A2 f21 f22 r2 row total × column total e=Total c1 c2 n n 22
23. 23. • Chi-squared test for independence – Contingency tables describe the relationship between two categorical variables. – H0: the two variables are independent – no relationship. – H1: the two variables are dependent – is a relationship. For a 2×2 contingency tabel: B row total × column total A B1 B2 Total e= n A1 f11 f12 r1 A2 f21 f22 r2 e11  (r1  c1 ) / n ; e12  (r1  c2 ) / nTotal c1 c2 n e21  (r2  c1 ) / n ; e22  (r2 23 2 ) / n c
24. 24. • Chi-squared test for independence – H0: the two variables are independent – no relationship. – H1: the two variables are dependent – is a relationship. Testing H0: Variables are independent Alternative Decision rule: Test statistic hypothesis Reject H0 if … ( f  e) 2  H1: Variables are Χ2 > Χ2(r – 1)(c – 1);1 – α 2 dependent e 24
25. 25. • Example – A household detergent is marketed in three sizes: • 1 000 ml, 750 ml and 250 ml – The market for potential buyers is divided into three age groups: • < 30 years old • 30–50 years old • > 50 years old – Market researcher believe that there is a relationship between the age of a buyer and the size of the packaging. 25
26. 26. • Solution – The data is summarised in a 3×3 contingency table. – H0: Size and age are independent. Observed – H1: Size and age are dependent. frequencies Age groups Size < 30 30–50 > 50 Total1 000 ml 27 41 14 82 750 ml 39 18 45 102 250 ml 8 2 6 16 Total 74 61 65 200 26
27. 27. • Solution – Calculate the expected frequency – (Row total×column total)/n Expected frequency: (74×82)/200 = 30,34 Age groups Size < 30 30–50 > 50 Total1 000 ml 27 30,34 41 25,01 14 26,65 82 750 ml 39 37,74 18 31,11 45 33,15 102 250 ml 8 5,92 2 4,88 6 5,20 16 Total 74 61 65 200 27
28. 28. Χ2(r – 1)(c – 1);1 – α = Χ2(3-1)(3-1);0.95 = 9.49• The hypotheses are: – H0: Size and age are independent α = 0,05 – H1: Size and age are dependent 0 9,49 ( f  e) 2   2 e Accept H0 Reject H0 (27  30,34) 2 (41  25, 01) 2 (6  5, 20) 2    .....  30,34 25, 01 5, 20  28,95 – Reject H0. Conclusion: At 5% significance level there is sufficient evidence to reject the null hypothesis. There is a relationship between the size of detergent that people prefer and their age. 28
29. 29. A recent survey of marketing managers in four different industries provided the data in the table below, which gives managers attitudes to market research and its value in marketing decision making:- INDUSTRY TYPEPerceived value Consumer Industrial Retail & Finance &of M Research businesses organisations wholesale insuranceLittle value 9 22 13 9Moderate value 29 41 6 17Great value 26 28 6 27TOTAL 64 91 25 53 Test at 1% level of significance whether manager’s perception of the value of the market research is dependent on the type of industry in which a marketing manager is employed. 29
30. 30. Industry type Perceived value of market Consumer Industrial Retail and Finance Total and research businesses organisations wholesale insurance Little value 9 (14,56) 22 (20,7) 13 (5,69) 9 (12,06) 53 Moderate value 29 (25,55) 41 (36,32) 6 (9,98) 17 (21,15) 93 Great value 26 (23,9) 28 (33,98) 6 (9,33) 27 (19,79) 87 Total 64 91 25 53 233 H0: Manager’s perception is independent of industry type. H1: Manager’s perception is dependent of industry type.  = 0,01  f  e 2  = 2  e 9 14,56 + … + 27 19,79 2 2 = 14,56 19,79  = 20,895 2 r 1c 1;1  Accept H0 Reject H0  =  2  6; 0,99 = 16,81 Therefore, reject H0. Manager’s perception is dependent on the industry type. 30 
31. 31. Questions 4 – 6, p361, textbook 31
32. 32. • Chi-squared Test of Homogeneity – Test if two or more populations are homogeneous (similar) with regard to a certain characteristic. – H0: The proportion of elements with certain characteristic in two or more different populations are the same. – H1: The proportion of elements with certain characteristic in two or more different populations are not the same. – The rest of the test is the same as the test for independence. 32
33. 33. An immigration attorney was investigating which industries totarget for obtaining new clients who might have problems withchange in the immigration laws. The lawyer selected fiveindustries and twenty workers were randomly selected in eachindustry and their visa statuses were verified. VISA STATUS INDUSTRY A B C D E Illegal resident 8 10 5 10 1 Legal resident 4 2 6 4 9 SA citizen 8 8 9 6 10Test at a 1% level of significance whether the 5 industries arehomogeneous with respect to the visa status of their workers 33
34. 34. Industry Visa status Total A B C D E Illegal resident 8 (6,8) 10 (6,8) 5 (6,8) 10 (6,8) 1 (6,8) 34 Legal resident 4 (5) 2 (5) 6 (5) 4 (5) 9 (5) 25 SA citizen 8 (8,2) 8 (8,2) 9 (8,2) 6 (8,2) 10 (8,2) 41 Total 20 20 20 20 20 100 H0: Five industries are homogeneous with respect to the visa status of their workers. H1: Five industries are heterogeneous with respect to the visa status of their workers.  = 0,01  f  e 2 2 =  e 8  6,8 + … + 10  8,2 2 2 = 6,8 8,2  = 15,32 2 r 1c 1;1   =  2  8; 0,99 = 20,09 34 Therefore, do not reject H0. The five industries are homogeneous with respect to  the visa status of their workers.
35. 35. CLASSWORK/HOMEWORK1. Activity 1,2,3,4 – p168 – 174, Module Manual2. Revision exercise 1, 2, 3 4 – p174 -176, Module Manual3. Self Review Test – 1 – 4, p 368, textbook4. Supplementary Exercises 1 – 11, p370, textbook 35