Cross-Tabs Continued Andrew Martin PS 372 University of Kentucky
Statistical Independence Statistical independence is a property of two variables in which the probability that an observation is in a particular category of one variable and a particular category of the other variable equals the simple or marginal probability of being in those categories. Contrary to other statistical measures discussed in class, statistical independence indicators test for a lack of a relationship between two variables.
Statistical Independence Let us assume two nominal variables, X and Y. The values for these variables are as follows: X: a, b, c, ... Y: r, s, t , ...
Statistical Independence P(X= a ) stands for the probability a randomly selected case has property or value a on variable X. P(Y=r) stands for the probability a randomly selected case has property or value r on variable Y P(X=a, Y=r) stands for the joint probability that a randomly selected observation has both property a and property r simultaneously.
Statistical Independence If X and Y are statistically independent: P(X= a , Y= r ) = [P(X= a )][P(Y= r )] for all a and r .
If gender and turnout are independent: Total obs in column m * Total obs in row v N = mv
Statistical Independence Total obs in column m * Total obs in row v N = mv 210 * 100 300 = 70 70 is the expected frequency. Because the observed and expected frequencies are the same, the variables are independent.
150 * 150 300 = 75
Here, the relationship is not independent (or dependent) because 75 (expected frequency) is less than 100 (observed frequency).
Testing for Independence How do we test for independence for an entire cross-tabulation table? A statistic used to test the statistical significance of a relationship in a cross-tabulation table is a chi-square test (χ 2 ).
Chi-Square Statistic The chi-square statistic essentially compares an observed result—the table produced by the data—with a hypothetical table that would occur if, in the population, the variables were statistically independent.
How is the chi-square statistic calculated? The chi-square test is set up just like a hypothesis test. The observed chi-square value is compared to the critical value for a certain critical region. A statistic is calculated for each cell of the cross-tabulation and is similar to the independence statistic.
How is the chi-square statistic calculated? (Observed frequency – expected frequency) 2
The null hypothesis is statistical independence between X and Y.
H 0 : X, Y Independent
The alternative hypothesis is X and Y are not independent.
H A : X, Y Dependent
The chi-square is a family of distributions, each of which depends on degrees of freedom. The degrees of freedom equals the number of rows minus one times the number of columns minus one. (r-1)(c-1)
Level of significance: The probability (α) of incorrectly rejecting a true null hypothesis.
Critical value: The chi-square test is always a one-tail test. Choose the critical value of chi-square from a tabulation to make the critical region (the region of rejection) equal to α.
(JRM: Appendix C, pg. 577)
The observed chi-2 is the sum of the squared differences between observed and expected frequencies divided by the expected frequency.
If χ 2 obs ≥ χ 2 crit. , reject null hypothesis. Otherwise, do not reject.
Let's assume we want to test the relationship at the .01 level.
The observed χ 2 is 62.21.
The degrees of freedom is (5-1)(2-1) = 4.
The critical χ 2 is 13.28.
Since 62.21 > 13.28, we can reject the null of an independent relationship.
Y (attitudes toward gun control) is dependent on X (gender).
The χ 2 statistic works for dependent variables that are ordinal or nominal measures, but another statistic is more appropriate for interval- and ratio-level data.