Calculating Expected Counts (8.4.1) | AP Statistics Notes

AP Syllabus focus:
‘Explain the process of calculating expected counts in the cells of a two-way table using the formula: Expected Count = (Row Total × Column Total) / Table Total. Clarify that these expected counts reflect what we would anticipate under the null hypothesis that the variables are independent of each other.’

Calculating expected counts in a two-way table allows statisticians to quantify what cell frequencies should look like when two categorical variables are assumed to be independent.

Understanding Expected Counts in Two-Way Tables

Expected counts play a central role in chi-square procedures for categorical data. When analyzing a two-way table, we compare observed counts to the counts we would expect if there were no association between the variables. These expected counts reflect the predictions made under the null hypothesis of independence, which states that knowing one variable’s category provides no information about the distribution of the other variable.

The Role of the Null Hypothesis

Under the null hypothesis of independence, the distribution of one categorical variable remains constant across the categories of the other variable. Because of this assumption, the expected count for each cell is derived from the product of the relevant row total and column total, scaled by the overall number of observations.

Null Hypothesis of Independence: A claim stating that two categorical variables do not influence each other, meaning their distributions are unrelated.

With this underlying framework, expected counts become the benchmark for assessing whether differences in observed data are large enough to suggest a meaningful relationship between the variables.

Formula for Calculating Expected Counts

Expected counts quantify the frequency we would predict for each cell if independence truly holds. This calculation is essential for constructing the chi-square statistic used later in inference procedures.

EQUATION

$\text{Expected Count} = \dfrac{\text{Row Total} \times \text{Column Total}}{\text{Table Total}}$
$\text{Row Total}$ = Total number of observations in the row
$\text{Column Total}$ = Total number of observations in the column
$\text{Table Total}$ = Overall sample size across the table

This formula ensures that expected counts reflect proportional distributions. Because the totals incorporate both variables, the expected cell frequencies align precisely with what would occur if the variables were independent.

A single expected count demonstrates how two-way table structure and proportional reasoning combine to produce a non-arbitrary, statistically grounded prediction.

Why Expected Counts Matter

Expected counts allow analysts to measure how far the observed data deviate from independence. By comparing observed and expected frequencies, the chi-square test evaluates whether differences are small enough to attribute to chance variation or large enough to suggest a relationship between variables.

Key Purposes of Expected Counts

Provide a baseline for comparison with observed counts under the assumption of independence.
Enable calculation of the chi-square statistic, which depends on the difference between observed and expected values.
Support inference decisions by determining whether deviations indicate an association between variables.

Expected counts are not guesses; they are mathematically derived predictions grounded in the joint structure of the table and the null hypothesis.

Process for Calculating Expected Counts

The computation follows a clear, repeatable sequence that ensures consistency across all cells in the table.

Steps for Determining Expected Counts

Identify the row total for the row containing the cell of interest.
Determine the column total for the corresponding column.
Use the overall table total, which is the sample size.
Substitute these values into the expected count formula.
Repeat the calculation for every cell to create a complete table of expected frequencies.

These steps make the procedure systematic, enabling students to confidently evaluate expected values prior to conducting a chi-square test.

This table displays observed counts by gender and major, showing the structure used to compute expected counts based on row totals, column totals, and the overall table total. Source.

Interpreting Expected Counts

Expected counts help illustrate how the distribution of one categorical variable would appear if it remained constant across levels of the other variable.

This table shows both counts and percentages, illustrating how proportional distributions can be compared across categorical variables. It extends slightly beyond the syllabus but reinforces how expected counts reflect marginal proportions. Source.

Insights from Expected Counts

Large deviations from expected counts may indicate possible association between variables.
Small deviations suggest that random variation likely explains the observed counts.
Expected counts greater than 5 are typically needed for valid chi-square inference, strengthening the reliability of comparisons.

Even though expected counts themselves do not produce conclusions, they serve as the foundational reference point for all chi-square analyses involving two-way tables.

Connecting Expected Counts to Independence

Because the formula incorporates both row and column totals, it embodies the idea of independence directly. If the variables are unrelated, the distribution across one variable should be proportionate across categories of the second variable. Expected counts formalize this proportionality.

Essential Conceptual Connections

Independence implies proportionality, which expected counts model mathematically.
Expected counts represent the distribution predicted solely by marginal totals, without influence from joint patterns.
Any systematic deviation from expected counts is a potential signal of dependence.

FAQ

Expected counts represent the frequencies that would arise purely from the overall distribution of each variable if the variables were independent. Observed cell frequencies already reflect the joint behaviour of the variables, which may include associations.

Marginal totals allow expected counts to be calculated without contamination from patterns within the table, ensuring they reflect only the structure implied by independence.

As the number of rows or columns increases, each expected count generally becomes smaller because the total sample is spread across more cells.

This makes it more likely that some expected counts will fall below recommended minimum thresholds, which can weaken the reliability of chi-square inference.

All expected counts increase proportionally because marginal totals grow while their ratios remain unchanged.

Larger expected counts reduce sampling variability in the observed counts, making it easier to detect deviations from independence.

Expected counts reflect theoretical frequencies based on proportional assumptions rather than actual observed individuals, so they may not be whole numbers.

This is entirely acceptable because expected counts are used for comparison in the chi-square statistic, not as actual tallied outcomes.

Yes. Because expected counts depend solely on marginal totals, a cell may have a high expected value if both its row and column totals are large, even when the observed count is small.

Such discrepancies often indicate potential departures from independence and can substantially influence the chi-square statistic.

Practice Questions

Question 1 (1–3 marks)
A two-way table summarises data for 200 individuals across two categorical variables. One cell corresponds to a row total of 50 and a column total of 80.
Calculate the expected count for this cell under the assumption that the variables are independent.

Question 1
• Correct application of the expected count formula: (50 × 80) / 200. (1 mark)
• Correct expected count: 20. (1 mark)

Question 2 (4–6 marks)
A researcher records whether students prefer studying in the morning or evening and whether they revise alone or in groups. The resulting two-way table shows:

Morning/Alone: 32
Morning/Group: 48
Evening/Alone: 28
Evening/Group: 52

The row totals are Morning: 80, Evening: 80.
The column totals are Alone: 60, Group: 100.
The table total is 160.

(a) Calculate the expected count for the Morning/Alone cell under independence.
(b) Calculate the expected count for the Evening/Group cell under independence.
(c) Explain how the expected counts would be used in a chi-square test of independence.

Question 2

(a) Morning/Alone expected count
• Correct formula: (80 × 60) / 160. (1 mark)
• Correct value: 30. (1 mark)

(b) Evening/Group expected count
• Correct formula: (80 × 100) / 160. (1 mark)
• Correct value: 50. (1 mark)

(c) Explanation
• States that expected counts represent what would be anticipated if the variables were independent. (1 mark)
• States that comparing observed and expected counts forms the basis of calculating the chi-square statistic, which assesses whether deviations indicate an association. (1 mark)

Try All Topic Practice Questions

Written by:

Dr Rahil Sachak-Patwa

Oxford University - PhD Mathematics

Rahil spent ten years working as private tutor, teaching students for GCSEs, A-Levels, and university admissions. During his PhD he published papers on modelling infectious disease epidemics and was a tutor to undergraduate and masters students for mathematics courses.