Stats 3000 Week 1 - Winter 2011

m m m m m m Inferences about categorical data Many experiments result in measurements that are qualitative or categoricalrather than quantitative. People classified by ethnic origin Cars classified by color M&M®s classified by type (plain or peanut)

A goodness-of-fittest: is an inferential procedure used to determine whether a frequency distribution in the population follows a specific distribution (No Preference-uniform distribution; No difference from another known population). is used to make inferences about the population frequency distribution from the samplefrequency distribution. determines how well a frequency distribution for a sample fits a distribution predicted by the null hypothesis (about the population frequency distribution).

We consider sample data consisting of observed frequency counts arranged in a single row or column (called a one-way frequency table or one dimensional table). We will test the claim that the observed frequency counts agree with some claimed distribution (specified by the null hypothesis), so that there is a good fit of the observed data with the claimed distribution.

Expected frequency counts. The expected frequency counts at each level of the categorical variable are equal to the sample size times the hypothesized proportion (in the population) at each level of the categorical variable from the null hypothesis E = np**************** where E is the expected frequency count for a specific level of the categorical variable, n is the total sample size, and p is the hypothesized proportion of observations for a specific level of the categorical variable

Goodness-of-fit hypothesis tests are always right-tailed.

Observed Frequencies Does the distribution specified by the null correspond to the observed distribution?

A large disagreement between observed and expected values will lead to a large value of 2.

“If the P is low (small), the null must go.” ,[object Object]

A significantly large value of 2 (small/low P-value) will cause a rejection of the null hypothesis of no difference between the observed and the expected frequencies.,[object Object]

P-Values Obtain an approximateP-value by determining the area under the chi-square distribution with k-1 degrees of freedom to the right of the test statistic from Appendix 2 page 697 OR Use computer software to obtain more precise P-value The area to the right of a 2value is P-value. Gives the probability of a value at least this extreme, if the null hypothesis were true .

. Relationships Among the 2 Test Statistic, P-Value, and Goodness-of-Fit .

Does the distribution of subjects across the levels of one variable change as a function of levels of the other variable? ,[object Object]

Depends on the subject matter ,[object Object]

Contingency Tables Contingency tables are described by the number of levels of each categorical variable E.g., a 2  2 table summarizes frequency counts for two dichotomous variables (e.g., male/female, smoker/nonsmoker) The number of cells in the table is the product of the number of levels for each variable: 2  2 table = 4 cells 3  3 table = 9 cells

The null hypothesis is that the variables are not associated; in other words, they are independent. The alternative hypothesis is that the variables are associated, or dependent.

The idea behind test of independence is to compare actual frequency counts to the counts we would expect if the null hypothesis were true . If a significant difference between the actual counts and expected counts exists, we would take this as evidence against the null hypothesis (stating that the variables are independent).

A researcher wanted to investigate whether there was a relationship between personality type (introvert, extrovert) and choice of recreational activity (going to an amusement park, taking 1 day meditation retreat).

We want to know whether personality and preferred recreational activity are dependent or independent so the hypotheses are: H0: personality and preferred recreational activity are independent H1: personality and preferred recreational activity are dependent

Expected Frequencies in a Chi-Square Test for Independence To find the expected frequencies in a cell when performing a chi-square independence test, multiply the row total of the row containing the cell by the column total of the column containing the cell and divide this result by the grand total. That is,

where R is the number of rows and C is the number of columns in the contingency table.

There are R= 2 rows and C=2 columns so we find the P-value using (2-1)(2-1) = 1 degrees of freedom. The P-value is the area under the chi-square distribution with 1 degrees of freedom to the right of which is smaller than 0.005.

Since the P-value is less than the level of significance  = 0.01,we reject the null hypothesis. There is sufficient evidence to conclude that personality and activity preference are dependent at the  = 0.01 level of significance.

Relative Risk Analysis after a significant Chi-Square test of independence (2 x2 tables) : risk ratio Does the distribution of subjects across the levels of one variable change as a function of levels of the other variable? Are introvert distributed across activity the same way as extrovert? Look at proportion of row totals. For people who prefer amusement park, is the proportion of introvert the same as the proportion of extrovert?

Are the proportion of people who prefer amusement park the same for introvert and extrovert? Comparing 30% vs 71.7% (risk ratio) .3/.717=0.4186=relative risk for preferring amusement park relative risk= the probability of preferring amusement park if you are introvert (12/40) is 0.418 times lower than the probability of preferring amusement park if you are extrovert (43/60). Is this a significant difference?

Are the proportion of people who prefer retreat the same for introvert and extrovert? Comparing 70% vs 28.3% (risk ratio) .7/.283=2.47=relative risk for preferring retreat relative risk= the probability of preferring retreat if you are introvert (28/40) is 2.47 times higher than the probability of preferring amusement park if you are extrovert (17/60).Is this a significant difference?

Stats 3000 Week 1 - Winter 2011

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (7)

Similar to Stats 3000 Week 1 - Winter 2011

Similar to Stats 3000 Week 1 - Winter 2011 (20)

Recently uploaded

Recently uploaded (20)

Stats 3000 Week 1 - Winter 2011