Cross-Tabs Continued Andrew Martin PS 372 University of Kentucky
Statistical Independence Statistical independence  is a property of two variables in which the probability that an observation is in a particular category of one variable and a particular category of the other variable equals the simple or marginal probability of being in those categories. Contrary to other statistical measures discussed in class, statistical independence indicators test for a lack of a relationship between two variables.
Statistical Independence Let us assume two nominal variables, X and Y. The values for these variables are as follows: X:  a, b, c,  ... Y:  r, s, t , ...
Statistical Independence P(X= a ) stands for the probability a randomly selected case has property or value  a  on variable X. P(Y=r) stands for the probability a randomly selected case has property or value  r  on variable Y P(X=a, Y=r) stands for the joint probability that a randomly selected observation has both property  a  and property  r  simultaneously.
Statistical Independence If X and Y are statistically independent: P(X= a , Y= r ) = [P(X= a )][P(Y= r )] for all  a  and  r .
Statistical Independence
If gender and turnout are independent: Total obs in column m * Total obs in row v   N = mv
Statistical Independence Total obs in column m * Total obs in row v   N = mv 210 * 100 300 = 70 70 is the expected frequency. Because the observed and expected frequencies are the same, the variables are independent.
150 * 150 300 = 75
Here, the relationship is not independent (or dependent) because 75 (expected frequency) is less than 100 (observed frequency).
Testing for Independence How do we test for independence for an entire cross-tabulation table?  A statistic used to test the statistical significance of a relationship in a cross-tabulation table is a  chi-square test (χ 2 ).
Chi-Square Statistic The chi-square statistic essentially compares an observed result—the table produced by the data—with a hypothetical table that would occur if, in the population, the variables were statistically independent.
How is the chi-square statistic calculated? The chi-square test is set up just like a hypothesis test. The observed chi-square value is compared to the critical value for a certain critical region. A statistic is calculated for each cell of the cross-tabulation and is similar to the independence statistic.
How is the chi-square statistic calculated? (Observed frequency – expected frequency) 2
Chi-Square Test The null hypothesis is statistical independence between X and Y. H 0 : X, Y Independent The alternative hypothesis is X and Y are not independent.  H A : X, Y Dependent
Chi-Square Test The chi-square is a family of distributions, each of which depends on degrees of freedom. The degrees of freedom equals the number of rows minus one times the number of columns minus one. (r-1)(c-1) Level of significance: The probability (α) of incorrectly rejecting a true null hypothesis.
Chi-Square Test Critical value: The chi-square test is always a one-tail test. Choose the critical value of chi-square from a tabulation to make the critical region (the region of rejection) equal to α. (JRM: Appendix C, pg. 577)
Chi-Square Test The observed chi-2 is the sum of the squared differences between observed and expected frequencies divided by the expected frequency.  If  χ 2 obs  ≥ χ 2 crit. , reject null hypothesis. Otherwise, do not reject.
 
Chi-Square Test Let's assume we want to test the relationship at the .01 level.  The observed  χ 2  is 62.21. The degrees of freedom is (5-1)(2-1) = 4. The critical χ 2  is 13.28. Since 62.21 > 13.28, we can reject the null of an independent relationship. Y (attitudes toward gun control) is dependent on X (gender).
Chi-Square Test The  χ 2  statistic works  for dependent variables that are ordinal or nominal measures, but another statistic is more appropriate for interval- and ratio-level data.

More tabs

  • 1.
    Cross-Tabs Continued AndrewMartin PS 372 University of Kentucky
  • 2.
    Statistical Independence Statisticalindependence is a property of two variables in which the probability that an observation is in a particular category of one variable and a particular category of the other variable equals the simple or marginal probability of being in those categories. Contrary to other statistical measures discussed in class, statistical independence indicators test for a lack of a relationship between two variables.
  • 3.
    Statistical Independence Letus assume two nominal variables, X and Y. The values for these variables are as follows: X: a, b, c, ... Y: r, s, t , ...
  • 4.
    Statistical Independence P(X=a ) stands for the probability a randomly selected case has property or value a on variable X. P(Y=r) stands for the probability a randomly selected case has property or value r on variable Y P(X=a, Y=r) stands for the joint probability that a randomly selected observation has both property a and property r simultaneously.
  • 5.
    Statistical Independence IfX and Y are statistically independent: P(X= a , Y= r ) = [P(X= a )][P(Y= r )] for all a and r .
  • 6.
  • 7.
    If gender andturnout are independent: Total obs in column m * Total obs in row v N = mv
  • 8.
    Statistical Independence Totalobs in column m * Total obs in row v N = mv 210 * 100 300 = 70 70 is the expected frequency. Because the observed and expected frequencies are the same, the variables are independent.
  • 9.
    150 * 150300 = 75
  • 10.
    Here, the relationshipis not independent (or dependent) because 75 (expected frequency) is less than 100 (observed frequency).
  • 11.
    Testing for IndependenceHow do we test for independence for an entire cross-tabulation table? A statistic used to test the statistical significance of a relationship in a cross-tabulation table is a chi-square test (χ 2 ).
  • 12.
    Chi-Square Statistic Thechi-square statistic essentially compares an observed result—the table produced by the data—with a hypothetical table that would occur if, in the population, the variables were statistically independent.
  • 13.
    How is thechi-square statistic calculated? The chi-square test is set up just like a hypothesis test. The observed chi-square value is compared to the critical value for a certain critical region. A statistic is calculated for each cell of the cross-tabulation and is similar to the independence statistic.
  • 14.
    How is thechi-square statistic calculated? (Observed frequency – expected frequency) 2
  • 15.
    Chi-Square Test Thenull hypothesis is statistical independence between X and Y. H 0 : X, Y Independent The alternative hypothesis is X and Y are not independent. H A : X, Y Dependent
  • 16.
    Chi-Square Test Thechi-square is a family of distributions, each of which depends on degrees of freedom. The degrees of freedom equals the number of rows minus one times the number of columns minus one. (r-1)(c-1) Level of significance: The probability (α) of incorrectly rejecting a true null hypothesis.
  • 17.
    Chi-Square Test Criticalvalue: The chi-square test is always a one-tail test. Choose the critical value of chi-square from a tabulation to make the critical region (the region of rejection) equal to α. (JRM: Appendix C, pg. 577)
  • 18.
    Chi-Square Test Theobserved chi-2 is the sum of the squared differences between observed and expected frequencies divided by the expected frequency. If χ 2 obs ≥ χ 2 crit. , reject null hypothesis. Otherwise, do not reject.
  • 19.
  • 20.
    Chi-Square Test Let'sassume we want to test the relationship at the .01 level. The observed χ 2 is 62.21. The degrees of freedom is (5-1)(2-1) = 4. The critical χ 2 is 13.28. Since 62.21 > 13.28, we can reject the null of an independent relationship. Y (attitudes toward gun control) is dependent on X (gender).
  • 21.
    Chi-Square Test The χ 2 statistic works for dependent variables that are ordinal or nominal measures, but another statistic is more appropriate for interval- and ratio-level data.