7.1.2 Independence and Homogeneity Test

The test below is used when investigating the relationship between two categorical variables.

      ∑  [Oij - Eij]2
χ2 =     ------------,
             Eij

where Oij is the observed frequency and Eij is the expected frequency. Eij equals the product of the total of row i and total of column j divided by n, the overall total. Using mathematical notation

Eij = Oi+--×-O+j-,
           n

see Table ??. The Chi-squared statistic has an approximate Chi-squared distribution with (R - 1)(C - 1) degrees of freedom (d.f), where R is the number of rows and C is the number of columns. For example say a researcher wants to investigate gender and religion. Gender (male/female) and religion will be broken into three categories: Buddhist, Christian, and other. The Chi-Squared test can be thought of testing either, independence of gender and religion or homogeneity of proportions between gender and religion. The reason is that if there is no association (independence) between gender and religion, then the proportion of men and women of each religion should be the same and vice versa.


Column




1 2 ⋅⋅⋅ C Row Totals







1 O11 O12 ⋅⋅⋅ O1c O1+
Row 2 O21 O22 ⋅⋅⋅ O2c O2+
.
.. .
.. .
.. .
.. .
.. .
..
r Or1 Or2 ⋅⋅⋅ Orc Or+







Column Totals O+1 O+2 ⋅⋅⋅ O+c n








Table 7.1: General R by C crosstab (Contingency Table).