The chi square test of indep of categorical variables

The Chi-Square test of
Indep of Categorical
Variables
DESMOND AYIM-ABOAGYE, PHD

Categorical Variables
 When there are two categorical variables, researchers are often interested in knowing
whether these variables are independent of one another or dependent on each other.
 By independent, each of the variables – and its influence– can be understood in
isolation from the other. Is more of a story revealed, however, when the association
between the two variables is examined? In other words, perhaps, understanding one of
the variables really depends on being aware of its relationship to the other.
 Thus, is the size of first-graders’ vocabularies at all related to whether their parents
read to them daily? Put another way, is the size of a child’s vocabulary independent of
being read to, or does verbal fluency depend on regular activity?

The Chi-square test for independence
 The Chi-square test for independence indicates whether the frequencies
associated with two variables (with two or more categories each) are statistically
independent of or dependent upon one another.

News Sources
Educational Status Television Newspapers Row Totals
College 47 62 109
High School 58 39 97
Column Totals 105 101 206
The specialist randomly samples 206 adult residents from a community, asking them to indicate their
educational status (i.e., high school diploma or college diploma) and their source of news (i.e., television or
the newspaper)

Observation
 A glance at the cell entries inside the table suggests that college graduates are
more likely to rely on the newspaper as their source for news, where high school
graduates tend to use the television predominantly for gathering information.
 Note that the row totals and their column totals representing the respective
variables are very similar in magnitude.
 The chi-square test for independence will real whether this apparent relationship
between the two categorical variables is a dependent one—that is, both variables
need to be considered simultaneously – or whether they are actually independent
of one another.

Step 1
 Determine whether the data was based on Nominal scale

Null and Alternative hypotheses (Step2)
 Alternative hypothesis:
 Whether a dependent relationship exists
between education and mode of news
gathering.
 Null hypothesis is that:
 Participants’ level of education is
independent of their mode of news
acquisition

Step 3
 Select a significance level (p value or a level) for the nonparametric test)

Step 4
 Step 4 entails the actual calculation of the nonparametric statistic, and here is the formula for
doing so:
 χ² = ∑ ∑= Σ (fo- fE)²
 fE
 ∑ (r =1) ∑ (c=1)
 r= number of rows, c= number of columns, fo and fe refer to observed and expected frequencies,
respectively.
 If the chi-square equals (χ²) or exceeds a critical value, (eg., P= 0.05 or P > 0.05) then the null hypothesis of
independence is rejected and the two variables are said to be in a dependent relationship with one another.

Calculating the χ²
 Please, note that because there are two variables, the expected frequencies cannot
be identified by simply dividing the number of participants by the number of cells
available.
 Instead, we use a simple procedure called the “cell A” strategy to calculate the
expected frequencies for the four cells. By cell A, we refer to the somewhat
arbitrary identification of the upper left cell in any 2 x 2 table (see below)
 Television Newspaper
 College A B
 High School C D

Calculating χ²
 The remaining three cells are labeled B, C, and D, accordingly. Our goal is to
calculate the expected frequency corresponding to each of these four cells.
 Cell A = column total x row total
 N
 Cell A = 105 x 109
 206
 Cell A = 55.56

Calculating X²
 Cell B = column total x row total
 N
 Cell A = 101 x 109
 206
 Cell A = 53.44

Calculating X²
Television Newspaper Television Newspaper
College A B College 55.56 53.44
High Sch C D High Sch 49.44 47.56
When computing X² test for independence, be certain that the row totals and the column totals each sum to N– any
discrepancy from N indicates that a math error is present.

Formula:
 χ² = ∑∑ (fo – fe)²
 r=1c=1
 fe
 χ² = (47 – 55.56)² + (62- 53.44)²
 55.6 53.44
 χ² = 5.71.
 Dfx = (r=1) (c-1)
 Dfx = (2-1) (2-1)
 Dfx = (1)(1)
 Dfx = 1.

APA Style
 χ² (1, N = 206) = 5.71
 TABLE B.7 in Appendix B (check df 1 and under .05 find 3.841
 χ² (1) = 5.71 ≥ χ² critical (1) = 3.84 – Reject H0.
 The significant χ² statistic is then reported as:
 χ² (1, N = 206) = 5.71, p < .05.
 Please, check DATA BOX 14B (p.538)

Phi (Ø) Coefficient
 The phi coefficient can be calculated when an investigator is performing a chi-square analysis on a
2 x 2 contingency table. In a manner similar to the Pearson r, the phi coefficient provides a
measure of association between two dichotomous variables. The formula for this supporting
statistic is:
 Ø = √χ²/N
 Using the data from the education and news acquisition, we find that:
 Ø = √5.71
 206
 Ø = √.0277
 Ø = .17.
 The value of the phi coefficient can range between 0 and 1, where higher values indicate a greater degree of
association between the variables.

Cramer’s V statistic
 Cramer’s V statistic is used only when a contingency table is larger than the
standard 2 x 2 size.
 The formula for Cramer's V is:
 V =
χ²
𝑁 (𝑛−1)

Statistically significant
 Where n refers to the smallest number of rows or columns present in a
contingency table. If the table were a 3 x 4 design, then n would be equal to 3.
One last suggestion: Neither the phi coefficient nor Cramer’s V should be
calculated unless the χ² is statistically significant.

The chi square test of indep of categorical variables

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to The chi square test of indep of categorical variables

Similar to The chi square test of indep of categorical variables (20)

More from Regent University

More from Regent University (20)

Recently uploaded

Recently uploaded (20)

The chi square test of indep of categorical variables