"

14

Chi-Square (Social Justice Focus)

The chi-square test is a powerful tool for uncovering patterns in categorical data—patterns that often reveal how resources, opportunities, and social outcomes are distributed across different groups. Whether we are examining disparities in housing access, differences in voting participation, or unequal experiences within the criminal legal system, chi-square allows us to test whether what we observe in the real world differs meaningfully from what we would expect in a just and equitable society. By comparing actual counts to expected counts, this test helps us move beyond assumptions and identify concrete areas where inequity persists. In this chapter, you will learn how to calculate and interpret chi-square results and use them to shine light on social issues where disproportionality matters most.

We come at last to our final topic: chi-square (chi square). This test is a special form of analysis called a nonparametric test, so the structure of it will look a little bit different from what we have done so far. However, the logic of hypothesis testing remains unchanged. The purpose of chi-square is to understand the frequency distribution of a single categorical variable or find a relationship between two categorical variables, which is a frequently very useful way to look at our data.

The Power of Chi-Square in Social Justice Research

The chi-square test is one of the most important tools for understanding patterns of inequality and representation. It allows us to examine whether the differences we see across social categories—such as race, gender, class, or ability—are random or statistically significant. From a social justice perspective, chi-square analysis helps reveal when systems or institutions are treating groups unequally. Whether we are studying disparities in hiring, access to housing, voter suppression, or educational outcomes, chi-square tests provide evidence to challenge “color-blind” or “neutral” claims. In short, chi-square analysis empowers researchers and advocates to quantify inequities, identify systemic bias, and transform raw data into evidence for social change.

Categories and Frequency Tables

Our data for the chi square test are categorical—specifically nominal—variables. Recall from Unit 1 that nominal variables have no specified order and can only be described by their names and the frequencies with which they occur in the dataset. Thus, unlike the other variables we have tested, we cannot describe our data for the chi square test using means and standard deviations. Instead, we will use frequencies tables.

Table 14.1 gives an example of a frequency table used for a chi square test. The columns represent the different categories within our single variable, which in this example is pet preference. The chi square test can assess as few as two categories, and there is no technical upper limit on how many categories can be included in our variable, although, as with ANOVA, having too many categories makes our computations long and our interpretation difficult. The final column in the table is the total number of observations, or Upper N. The chi square test assumes that each observation comes from only one person and that each person will provide only one observation, so our total observations will always equal our sample size.

Table 14.1. Community Resource Priorities

Affordable Housing

Mental Health Services

Youth Programs

Total

Observed

14

17

5

36

Expected

12

12

12

36

There are two rows in this table. The first row gives the observed frequencies of each category from our dataset; in this example, 14 people reported affordable housing as a priority, 17 people reported mental health services as a priority, and 5 people reported youth programs as a priority. The second row gives expected values; expected values are what would be found if each category had equal representation. The calculation for an expected value is:

where Upper N is the total number of people in our sample and C is the number of categories in our variable (also the number of columns in our table). The expected values correspond to the null hypothesis for chi square tests: equal representation of categories. Our first of two chi square tests, the test for goodness of fit, will assess how well our data lines up with, or deviates from, this assumption.

Test for Goodness of Fit

The test for goodness of fit assesses one categorical variable against a null hypothesis of equally sized frequencies. Equal frequency distributions are what we would expect to get if categorization was completely random. We could, in theory, also test against a specific distribution of category sizes if we have a good reason to. If we have information about how a population is distributed, we could compare our observed sample distribution to the expected values if the sample followed the same distribution as the population.

Hypotheses

All chi square tests, including the test for goodness of fit, are nonparametric tests. This means that there is no population parameter we are estimating or testing against; we are working only with our sample data. Because of this, there are no mathematical statements for chi square hypotheses. This should make sense because the mathematical hypothesis statements were always about population parameters (e.g., mu), so if we are nonparametric, we have no parameters and therefore no mathematical statements.

We do, however, still state our hypotheses verbally. For chi square tests for goodness of fit, our null hypothesis is that there is an equal number of observations in each category. That is, there is no difference between the categories in how prevalent they are. Our alternative hypothesis says that the categories do differ in their frequency. We do not have specific directions or one-tailed tests for chi square, matching our lack of mathematical statements.

Degrees of Freedom and the Table

Our degrees of freedom for the chi square test are based on the number of categories we have in our variable, not on the number of people or observations like it was for our other tests. Luckily, they are still as simple to calculate:

df: (rows-1) × (columns – 1)

So for our community priority example, we have 3 categories, thus we have 2 degrees of freedom. Our degrees of freedom, along with our significance level (still defaulted to a = .05) are used to find our critical values in the chi square table, a portion of which is shown in Table 14.2. (The complete chi square table can be found in Appendix E.) Because we do not have directional hypotheses for chi square tests, we do not need to differentiate between critical values for one- or two-tailed tests. In fact, just like our tests for regression and ANOVA, all chi square tests are one-tailed tests.

Table 14.2. Critical Values for Chi-Square ( Table)

df

Proportion in Critical Region

.1

.05

.02

.01

.005

1

2.706

3.841

5.024

6.635

7.879

2

4.605

5.991

7.378

9.210

10.597

3

6.251

7.815

9.348

11.345

12.838

4

7.779

9.488

11.143

13.277

14.860

5

9.236

11.070

12.833

15.086

16.750

6

10.645

12.592

14.449

16.812

18.548

7

12.017

14.067

16.013

18.475

20.278

8

13.362

15.507

17.535

20.090

21.955

9

14.684

16.919

19.023

21.666

23.589

10

15.987

18.307

20.483

23.209

25.188

Statistic

The calculations for our test statistic in chi square tests combine our information from our observed frequencies (O) and our expected frequencies (E) for each level of our categorical variable. For each cell (category) we find the difference between the observed and expected values, square them, and divide by the expected values. We then sum this value across cells for our test statistic. This is shown in the formula:

For our community priority preference data, we would have:

Notice that, for each cell’s calculation, the expected value in the numerator and the expected value in the denominator are the same value. Let’s now take a look at an example from start to finish.

Example: Support for Ending Cash Bail

Cash bail policies—which often require people to pay money to secure their release while awaiting trial—have long raised serious social justice concerns. Research shows that cash bail disproportionately harms low-income people and communities of color, keeping legally innocent individuals incarcerated simply because they cannot afford to pay. To better understand public attitudes toward reform, we gather data from a group of adults, asking whether they support ending cash bail (Yes or No).

Step 1: State the Hypotheses

We begin with our hypotheses. Our null hypothesis of no difference states that an equal number of people will support and not support ending cash bail. Our alternative hypothesis is that one position is more common than the other.

H0: An equal number of people are in support and not in support of ending cash bail.

Ha: A significant majority of people are in support or not in support of ending cash bail.

Step 2: Find the Critical Value

To avoid any potential bias in this crucial analysis, we will leave alpha at its typical level. We have two options in our data (Yes or No), which will give us two categories. Based on this, we will have 1 degree of freedom. From our chi square table, we find a critical value of 3.84.

Step 3: Calculate the Test Statistic

The results of the data collection are presented in Table 14.3. We had data from 45 people in all and 2 categories, so our expected values are E = 45/2 = 22.50.

Table 14.3. Support for Ending Cash Bail

Yes

No

Total

Observed

26

19

45

Expected

22.50

22.50

45

We can use these to calculate our chi square statistic:

Step 4: Make the Decision

Our observed test statistic had a value of 1.08 and our critical value was 3.84. Our test statistic was smaller than our critical value, so we fail to reject the null hypothesis. Based on our sample of 45 people, there is no significant difference between the observed and expected values for support or non support for ending cash bail, (1, N = 45) = 1.089, p = .297.

Contingency Tables for Two Variables

The test for goodness of fit is a useful tool for assessing a single categorical variable. However, what is more common is wanting to know if two categorical variables are related to one another. This type of analysis is similar to a correlation, the only difference being that we are working with nominal data, which violates the assumptions of traditional correlation coefficients. This is where the test for independence comes in handy.

As noted above, our only description for nominal data is frequency, so we will again present our observations in a frequency table. When we have two categorical variables, our frequency table is crossed. That is, each combination of levels from each categorical variable is presented. This type of frequency table is called a contingency table because it shows the frequency of each category in one variable, contingent upon the specific level of the other variable.

An example contingency table is shown in Table 14.4, which displays whether or not 168 college students experienced housing insecurity as children (Yes/No) and whether the students’ final decision about which college to attend was influenced by the college’s basic-needs support systems (Yes, primary; Yes, somewhat; No).

Table 14.4. Contingency table of childhood housing insecurity and the role of basic-needs support in college decision-making

Basic-Needs Support

Primary

Somewhat

No

Total

Childhood Housing Insecurity

Yes

47

26

14

87

No

21

23

37

81

Total

68

49

51

168

In contrast to the frequency table for our test for goodness of fit, our contingency table does not contain expected values, only observed data. Within our table, wherever our rows and columns cross, we have a cell. A cell contains the frequency of observing its corresponding specific levels of each variable at the same time. The top left cell in Table 14.4 shows us that 47 people in our study experienced housing insecurity a child and the need for basic-needs support was a primary factor in making a decision of which college to attend. 

Cells are numbered based on which row they are in (rows are numbered top to bottom) and which column they are in (columns are numbered left to right). We always name the cell using (R,C), with the row first and the column second.  Based on this convention, the top left cell containing our 47 participants who experienced housing insecurity as a child and had basic needs support as a primary criteria is cell (1,1). Next to it, which has 26 people who did experience housing insecurity as a child, but the college’s basics needs support only somewhat affect their decision, is cell (1,2), and so on. We only number the cells where our categories cross. We do not number our total cells, which have their own special name: marginal values.

Marginal values are the total values for a single category of one variable, added up across levels of the other variable. In Table 14.4, these marginal values have been made bold for ease of explanation, though this is not normally the case. We can see that, in total, 87 of our participants (47 + 26 + 14) reported yes to having housing insecurity growing up and 81 (21 + 23 + 37) did not. The total of these two marginal values is 168, the total number of people in our study. Likewise, 68 people used basic needs support as a primary criterion for deciding which college to attend, 50 considered it somewhat, and 50 did not consider it at all. The total of these marginal values is also 168, our total number of people. The marginal values for rows and columns will always both add up to the total number of participants, Upper N, in the study. If they do not, then a calculation error was made and you must go back and check your work.

Expected Values of Contingency Tables

Our expected values for contingency tables are based on the same logic as they were for frequency tables, but now we must incorporate information about how frequently each row and column was observed (the marginal values) and how many people were in the sample overall (Upper N) to find what random chance would have made the frequencies out to be. Specifically:

The subscripts i and j indicate which row and column, respectively, correspond to the cell we are calculating the expected frequency for, and the Ri and Cj are the row and column marginal values, respectively. Upper N is still the total sample size. Using the data from Table 14.4, we can calculate the expected frequency for cell (1,1), the housing insecurity and based needs support as primary criteria in deciding what college to attend, to be:

We can follow the same math to find all the expected values for this table:

Basic Needs Support

Primary

Somewhat

No

Total

Housing Insecurity

Yes

35.21

25.38

26.41

87

No

32.79

23.62

24.59

81

Total

68

49

51

168

Notice that the marginal values still add up to the same totals as before. This is because the expected frequencies are just row and column averages simultaneously. Our total Upper N will also add up to the same value.

The observed and expected frequencies can be used to calculate the same chi square statistic as we calculated for the test for goodness of fit. Before we get there, though, we should look at the hypotheses and degrees of freedom used for contingency tables.

Test for Independence

The chi square test performed on contingency tables is known as the test for independence. In this analysis, we are looking to see if the values of each categorical variable (that is, the frequency of their levels) is related to or independent of the values of the other categorical variable. Because we are still doing a chi square test, which is nonparametric, we still do not have mathematical versions of our hypotheses. The actual interpretations of the hypotheses are quite simple: the null hypothesis says that the variables are independent or not related, and the alternative hypothesis says that they are not independent or that they are related. Using this setup and the data provided in Table 14.4, let’s formally test for whether having housing insecurity  as a child is related to using basic needs as a criteria for selecting a college to attend.

We will follow the same four-step procedure as we have since Chapter 7.

Step 1: State the Hypotheses

Our null hypothesis of no difference will state that there is no relationship between our variables, and our alternative will state that our variables are related.

H0: There is no association between whether students experienced housing insecurity as children and whether basic-needs support (primary, somewhat, no) influenced their college choice.

Ha: There is an association between childhood housing insecurity and whether basic-needs support (primary, somewhat, no) influenced students’ college choice.

Step 2: Find the Critical Value

Our critical value will come from the same table that we used for the test for goodness of fit, but our degrees of freedom will change. Because we now have rows and columns (instead of just columns) our new degrees of freedom use information from both:

In our example:

Based on our 2 degrees of freedom, our critical value from our table is 5.991.

Step 3: Calculate the Test Statistic

The same formula for chi square is used once again:

Step 4: Make the Decision

The final decision for our test of independence is still based on our observed value (20.31) and our critical value (5.991). Because our observed value is greater than our critical value, we can reject the null hypothesis.

Reject H0. Based on our data from 168 people, we can say that there is a statistically significant relationship between whether someone has had housing insecurity and the influence a college’s basic needs support have on that person’s decision on which college to attend.

Conclusion:

The chi-square test is a powerful tool for uncovering patterns in categorical data that signal whether social experiences are distributed equitably across different groups. By comparing observed frequencies to what would be expected by chance, chi-square helps us see when structural forces—such as housing insecurity, discrimination, resource access, or policy barriers—shape people’s opportunities in ways that are not random. In social justice research, this matters greatly: detecting associations between lived experiences and outcomes can reveal inequities that might otherwise remain hidden in simple averages or narratives. Whether we are examining disparities in educational pathways, health outcomes, employment practices, or interactions with social institutions, chi-square enables us to document unequal patterns, question why they exist, and begin imagining data-driven interventions that promote fairness and support communities most impacted by systemic inequality.

Exercises

  1. What does a frequency table display? What does a contingency table display?
  2. What does a test for goodness of fit assess?
  3. How do expected frequencies relate to the null hypothesis?
  4. What does a test for independence assess?
  5. Compute the expected frequencies for the following contingency table:

    Category A

    Category B

    Category C

    22

    38

    Category D

    16

    14

  6. Test significance and find effect sizes for the following tests:
    1. N = 19, R = 3, C = 2, chi square(2) = 7.89, a = .05
    2. N = 12, R = 2, C = 2, chi square(1) = 3.12, a = .05
    3. N = 74, R = 3, C = 3, chi square(4) = 28.41, a = .01
  7. You hear a lot of people claim that The Empire Strikes Back is the best movie in the original Star Wars trilogy, and you decide to collect some data to demonstrate this empirically (pun intended). You ask 48 people which of the original movies they liked best; 8 said A New Hope was their favorite, 23 said The Empire Strikes Back was their favorite, and 17 said Return of the Jedi was their favorite. Perform a chi square test on these data at the .05 level of significance.
  8. A pizza company wants to know if people order the same number of different toppings. They look at how many pepperoni, sausage, and cheese pizzas were ordered in the last week. Fill out the rest of the frequency table and test for a difference.

    Pepperoni

    Sausage

    Cheese

    Total

    Observed

    320

    275

    251

    Expected

  9. A university administrator wants to know if there is a difference in proportions of students who go on to grad school across different majors. Use the data below to test whether there is a relationship between college major and going to grad school.

    Major

    Psychology

    Business

    Math

    Graduate School

    Yes

    32

    8

    36

    No

    15

    41

    12

  10. A company you work for wants to make sure they are not discriminating against anyone in their promotion process. You have been asked to look across gender to see if there are differences in promotion rate (i.e., if gender and promotion rate are independent or not). The following data should be assessed at the normal level of significance:

    Promoted in Last Two Years?

    Yes

    No

    Gender

    Women

    8

    5

    Men

    9

    7

Answers to Odd-Numbered Exercises

1)
Frequency tables display observed category frequencies and (sometimes) expected category frequencies for a single categorical variable. Contingency tables display the frequency of observing people in crossed category levels for two categorical variables, and (sometimes) the marginal totals of each variable level.

3)
Expected values are what we would observe if the proportion of categories was completely random (i.e., no consistent difference other than chance), which is the same was what the null hypothesis predicts to be true.

5)
Observed:

Category A

Category B

Total

Category C

22

38

60

Category D

16

14

30

Total

38

52

90

Expected:

Category A

Category B

Total

Category C

60

Category D

30

Total

38

52

90

7)

Step 1: H0: “There is no difference in preference for one movie,” HA: “There is a difference in how many people prefer one movie over the others.”
Step 2: Three categories (columns) gives df = 2, = 5.991
Step 3: Based on the given frequencies:

New Hope

Empire

Jedi

Total

Observed

8

23

17

48

Expected

16

16

16

chi square = 7.13. This is a statistically significant result

Step 4: Our obtained statistic is greater than our critical value, reject H0. Based on our sample of 48 people, there is a statistically significant difference in the proportion of people who prefer one Star Wars movie over the others, chi square(2, N = 48) = 7.13, p < .05.

9)
Step 1: H0: “There is no relationship between college major and going to grad school,” HA: “Going to grad school is related to college major.”
Step 2: df = 2, = 5.991
Step 3: Based on the expected frequencies:

Major

Psychology

Business

Math

Graduate School

Yes

24.81

25.86

25.33

No

22.19

23.14

22.67

chi square = 2.09 + 12.34 + 4.49 + 2.33 + 13.79 + 5.02 = 40.05Step 4: Obtained statistic is greater than the critical value, reject H0. Based on our data, there is a statistically significant relationship between college major and going to grad school, chi square(2, N = 144) = 40.05, p < .05

 

definition

License

Icon for the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License

Introduction to Statistics Copyright © 2025 by Susan Miller, Ph.D. and Christina Timmons is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License, except where otherwise noted.

Share This Book