AP Syllabus focus:
‘Learning Objective: Calculate the appropriate statistic for a chi-square test for homogeneity or independence. Essential Knowledge: The chi-square statistic formula: Chi-square statistic = Σ((Observed count - Expected count)^2 / Expected count), with degrees of freedom calculated as (number of rows - 1) * (number of columns - 1). This statistic is crucial for evaluating the discrepancy between observed and expected frequencies in a chi-square test for homogeneity or independence.’
The chi-square statistic is a fundamental tool for evaluating whether observed categorical data differ meaningfully from what would be expected under a specified null hypothesis of no association.
Understanding the Purpose of the Chi-Square Statistic
The chi-square statistic measures the overall discrepancy between observed counts and expected counts in a two-way table. In the context of chi-square tests for homogeneity or independence, this statistic provides a numerical summary of how far the sample data depart from the pattern predicted by the null hypothesis. A larger statistic indicates stronger evidence against the null hypothesis because the observed frequencies differ more substantially from expected frequencies.
Because the chi-square statistic evaluates categorical data by comparing counts, not proportions, it is well suited for assessing distributional differences or potential associations between variables. The test assumes that each category’s expected count represents the value predicted if the null hypothesis were true, making any deviation meaningful for inference.
Key Components of the Statistic
The chi-square statistic incorporates two essential elements:
Observed counts, which represent the actual data collected in each cell of the two-way table.
Expected counts, which represent what would be anticipated if the null hypothesis—no association (independence) or no difference in distributions (homogeneity)—were correct.
Comparing these values across all cells allows analysts to determine whether patterns in the data reflect random variation or statistically meaningful discrepancies.
In a chi-square test for homogeneity or independence, data are organized in a two-way table whose cells contain observed counts, the actual frequencies recorded in each category combination.

A two-way contingency table displaying observed counts for gender and sport preference. The marginals and grand total illustrate how raw data are structured before computing expected counts and the chi-square statistic. Source.
The Chi-Square Formula
The chi-square statistic is constructed by summing standardized squared differences across all cells in the table. Each term in the summation represents a single cell’s contribution to the overall measure of discrepancy.
EQUATION
= Chi-square statistic, a measure of discrepancy between observed and expected counts
= Observed count in a table cell
= Expected count in a table cell
This formulation ensures that the statistic increases as observed counts deviate further from the expected pattern. Cells with larger differences contribute more heavily to the overall value, but the statistic simultaneously accounts for relative magnitude by dividing by the expected count. Thus, differences in cells with small expected counts do not disproportionately influence the test.
The calculation process always includes at least one observed and one expected count per table cell, ensuring that all structural components of the two-way table influence the final statistic. This comprehensive structure allows the chi-square statistic to detect patterns that span multiple categories even when no single cell shows a large discrepancy on its own.
Each expected count represents what you would anticipate in that cell if the null hypothesis of no difference in distributions or no association were true.

A two-way table of expected counts for gender and sport preference, showing the formula used to compute each cell’s expected value under the null hypothesis. The explicit calculations displayed exceed the minimal expectations for this subsubtopic but reinforce how expected frequencies are generated for use in the chi-square formula. Source.
Understanding Degrees of Freedom
Degrees of freedom play an essential role in determining the distribution of the chi-square statistic and therefore influence p-value calculations used for inferential decisions.
EQUATION
= Degrees of freedom for a chi-square test
= Number of rows in the two-way table
= Number of columns in the two-way table
These degrees of freedom reflect the number of independent comparisons available once marginal totals have been fixed. As the number of categories increases, the chi-square distribution becomes more spread out, affecting both the shape of the distribution and the critical values associated with significance levels.
Understanding degrees of freedom helps students interpret the chi-square statistic in a broader inferential context, as the same chi-square value may indicate different levels of evidence depending on the table’s complexity.
The resulting chi-square statistic is paired with degrees of freedom calculated as (number of rows−1)(number of columns−1)(\text{number of rows} - 1)(\text{number of columns} - 1)(number of rows−1)(number of columns−1), preparing it for use with the chi-square distribution in later inference steps.

Probability density curves for chi-square distributions with various degrees of freedom. As degrees of freedom increase, the distribution becomes less right-skewed, illustrating how table size affects the interpretation of the chi-square statistic. Extra detail about distribution shapes goes slightly beyond the subsubtopic but strengthens conceptual understanding. Source.
Steps for Calculating the Chi-Square Statistic
The calculation of the chi-square statistic follows a structured process designed to ensure that all discrepancies between observed and expected counts are systematically evaluated.
Step-by-Step Process
Identify observed counts from the sample data in each cell of the two-way table.
Compute expected counts using the assumption that the null hypothesis is true. For independence, expected counts reflect the product of row and column proportions; for homogeneity, they reflect equal distributions across groups.
Apply the chi-square formula, computing for each cell.
Sum all cell contributions to obtain the overall chi-square statistic.
Determine the degrees of freedom using the number of rows and columns in the table.
Prepare the statistic for inference, noting that larger values indicate greater departure from the null hypothesis.
Each step contributes to building a systematic comparison between observed outcomes and theoretical expectations. By treating each cell independently but aggregating across the entire table, the chi-square statistic becomes a powerful indicator of whether the data align with the null hypothesis.
Interpreting the Meaning of the Statistic
A large chi-square statistic implies that the observed counts differ substantially from the expected counts, suggesting that the sample provides evidence against the null hypothesis. Conversely, a small statistic indicates that any observed differences are minor and consistent with what random variation might produce.
This measure does not, by itself, determine statistical significance; it must be evaluated using a chi-square distribution with appropriate degrees of freedom. Nonetheless, understanding how the statistic is constructed enables students to interpret the results of chi-square tests with greater clarity and precision.
FAQ
Dividing by the expected count standardises the discrepancy so that cells with larger expected frequencies do not dominate the statistic simply because of their scale.
This ensures that the chi-square statistic reflects relative, not absolute, differences. It also links directly to the theoretical chi-square distribution, which is derived using expected frequencies under the null hypothesis.
A single cell with a large deviation from its expected count can substantially increase the chi-square statistic, even when other cells show little difference.
However, chi-square tests consider the cumulative evidence across all cells. A single-cell discrepancy may or may not be enough for statistical significance, depending on the size of the deviation and the degrees of freedom.
Observed and expected values must align cell-by-cell because the chi-square statistic evaluates each category combination independently.
Mixing structures (for example, changing category boundaries or totals) invalidates the comparison, as the expected values would no longer represent the null hypothesis corresponding to the observed data.
No. Each cell’s contribution uses a squared difference, which is always non-negative, and division by a positive expected count.
Since the statistic is a sum of these non-negative values, the final chi-square value must also be non-negative. This aligns with the chi-square distribution, which exists only on positive values.
More categories introduce more cells, each adding a contribution to the chi-square statistic.
Even small deviations, accumulated across many cells, can result in a larger overall statistic. This is why degrees of freedom must also increase, altering the reference distribution used to judge whether the statistic is large enough to be considered unusual.
Practice Questions
Question 1 (1–3 marks)
A researcher collects data in a two-way table to investigate whether there is an association between type of exercise (yoga, cycling, running) and age group (under 30, 30–50, over 50).
Explain how the chi-square statistic is calculated from the observed and expected counts in the table.
Question 1
• 1 mark: States that the statistic is found by comparing observed and expected counts for each cell.
• 1 mark: Describes calculating (Observed − Expected) squared divided by Expected for each cell.
• 1 mark: States that these values are summed across all cells to give the chi-square statistic.
Question 2 (4–6 marks)
A school records the number of students in each year group who prefer one of four lunch options: hot meal, salad, sandwich, or packed lunch. The data are arranged in a two-way table with year group as rows and lunch preference as columns.
(a) Describe how the expected counts for each cell are obtained under the assumption that lunch preference is independent of year group.
(b) The school calculates the chi-square statistic. Explain how each cell contributes to the overall statistic and why larger discrepancies increase the value of the chi-square statistic.
(c) State how the degrees of freedom for this chi-square test would be determined.
Question 2
(a)
• 1 mark: States that expected counts are based on row totals, column totals, and the table total.
• 1 mark: States that expected count = (row total × column total) / overall total.
(b)
• 1 mark: States that each cell contributes a value of (Observed − Expected) squared divided by Expected.
• 1 mark: Explains that greater differences between observed and expected counts produce larger contributions to the chi-square statistic.
(c)
• 1 mark: Correctly gives degrees of freedom as (number of rows − 1) multiplied by (number of columns − 1).
