AP Syllabus focus:
‘Compare the p-value to the significance level α to make a decision about the null hypothesis. If the p-value ≤ α, reject the null hypothesis, indicating that there is a statistically significant difference between the observed and expected counts. If the p-value > α, fail to reject the null hypothesis, indicating insufficient evidence to suggest a difference between the observed and expected counts.’
Making inference decisions in chi-square goodness-of-fit tests requires comparing the p-value to a chosen significance level to determine whether observed categorical data meaningfully contradict expected distributions.
Understanding the Goal of Inference Decisions
Inference decisions in a chi-square goodness-of-fit test address the central question: Does the sample provide convincing statistical evidence that the observed distribution differs from the expected distribution under the null hypothesis? This judgment relies on interpreting the p-value, a probability calculated from the chi-square distribution, and comparing it with the predetermined significance level (α). The decision framework ensures consistent, objective evaluation of whether deviations between observed and expected counts are likely due to chance or represent meaningful differences in the population.
The Role of the Significance Level α
The significance level (α) represents the threshold at which the evidence is considered strong enough to reject the null hypothesis. Common values include 0.05, 0.01, and 0.10, depending on the research context. A smaller α indicates a stricter standard for rejecting the null hypothesis, reducing the likelihood of a Type I error (falsely rejecting a true null hypothesis). By establishing α before analyzing the data, researchers maintain unbiased decision criteria grounded in statistical convention.
Decision Structure When Comparing the p-value to α
At the heart of chi-square inference is a structured comparison between two quantities that reflect uncertainty and tolerance for error. This comparison produces one of two possible decisions, guiding how evidence from sample data informs conclusions about the population.
If the p-value ≤ α, the evidence against the null hypothesis is considered statistically significant.
If the p-value > α, the evidence is insufficient to reject the null hypothesis.
Only one decision is made—either rejecting or failing to reject the null hypothesis.
Statistical significance refers strictly to the decision outcome and does not imply practical importance or the size of deviations.
Clarifying Key Terminology in Inference Decisions
Students often misinterpret the meaning of rejecting or failing to reject the null hypothesis. The decision communicates what the data suggest about population-level patterns, based on probability.
Null Hypothesis: A claim that the observed categorical distribution matches an expected distribution; deviations are attributed solely to random variation.
A null hypothesis asserts equality between observed and expected counts across categories, establishing a baseline model for comparison. Rejecting or failing to reject this claim hinges on the magnitude of the chi-square statistic and the resulting p-value.
After introducing the null hypothesis, it is equally important to clarify the meaning of the alternative.
Alternative Hypothesis: A claim that at least one observed category proportion differs from the expected proportion specified in the null hypothesis.
The alternative hypothesis functions as the competing explanation, suggesting the observed discrepancies are too large to plausibly occur by chance alone.
Interpreting the p-value for Decision Making
The p-value quantifies how surprising the observed chi-square statistic would be if the null hypothesis were true. Smaller p-values indicate that the observed discrepancies between counts are unlikely under the expected distribution. This interpretation highlights why the p-value serves as the foundation for inference decisions: it converts numerical deviations into probabilistic evidence.
When the p-value is less than or equal to α, researchers conclude that the observed distribution provides convincing evidence against the null hypothesis. Importantly, this does not prove the alternative hypothesis; rather, it indicates that random variation is an unlikely explanation for the observed discrepancies.
Making the Inference Decision
The decision process for a chi-square goodness-of-fit test involves synthesizing the p-value interpretation with the established significance level. This ensures alignment with standardized statistical reasoning.
Decision When p-value ≤ α
When the p-value is small enough to fall at or below α, the discrepancy between observed and expected counts is considered statistically significant. In this case:
The null hypothesis is rejected.
The data provide evidence that at least one category proportion differs from what the null hypothesis specified.
The result indicates a meaningful departure from the expected distribution.
This outcome signals that the model described by the null hypothesis does not adequately represent the population distribution reflected by the data.
Decision When p-value > α
When the p-value exceeds α, the evidence is insufficient to warrant rejecting the null hypothesis. In this case:
The null hypothesis is not rejected (or one “fails to reject” the null).
The observed deviations are considered consistent with what might occur through random sampling variation.
No claim is made that the distributions match perfectly; rather, there is inadequate evidence to declare a significant difference.
Connecting the Decision to Population Claims
Inference decisions extend beyond sample results by informing claims about the broader population. Rejecting the null hypothesis suggests a population-level discrepancy in categorical proportions, whereas failing to reject it indicates that sample evidence does not convincingly challenge the expected distribution. These decisions guide interpretations about real-world categorical patterns while grounding conclusions in statistical probability rather than certainty.

This graph displays chi-square probability density curves for several degrees of freedom, illustrating how the distribution’s right tail represents extreme chi-square values used when evaluating evidence against the null hypothesis. Source.

This normal distribution diagram highlights rejection regions in both tails, demonstrating that α defines extreme values leading to rejection of the null hypothesis, a general decision rule shared across significance tests. Source.
FAQ
The chi-square statistic measures squared deviations between observed and expected counts, so it can only take positive values. Larger values indicate greater discrepancy.
Because of this structure, only unusually large chi-square values provide evidence against the null hypothesis. The right tail of the chi-square distribution therefore corresponds to outcomes that would be highly unlikely if the expected distribution were true.
A smaller significance level, such as 0.01, makes it harder to reject the null hypothesis, as only very strong evidence would be considered sufficient.
A larger significance level, such as 0.10, increases the likelihood of rejecting the null hypothesis, as the threshold for evidence is lower.
Choosing alpha balances the risk of mistakenly rejecting or failing to reject the null hypothesis.
A chi-square goodness-of-fit test only indicates whether at least one category differs from expectation but does not identify which one.
However, large individual contributions to the chi-square statistic can offer clues.
• Categories with unusually large residuals often contribute most to the discrepancy.
• Standardised residuals may be examined informally to highlight which categories deviate most from expectation.
Yes. With a large sample size, even small differences between observed and expected counts may become statistically significant.
A small p-value reflects how unlikely the observed deviations are under the null hypothesis, not necessarily how dramatic they appear.
This distinction highlights the importance of considering both statistical evidence and practical relevance.
Failing to reject the null hypothesis suggests the model is plausible given the sample data, but does not confirm it as correct.
The test may lack sensitivity if the sample size is small or if the deviations are subtle.
Thus, the conclusion is one of insufficient evidence rather than validation of the model itself.
Practice Questions
Question 1 (1–3 marks)
A chi-square goodness-of-fit test is carried out to compare observed category counts with expected counts. The resulting p-value is 0.032, and the significance level is 0.05.
State the appropriate inference decision and briefly justify it.
Question 1
• 1 mark: Identifies decision: fail to reject the null hypothesis OR do not reject H0.
• 1 mark: Correct justification that p-value (0.032) is less than significance level (0.05).
• 1 mark: States that there is sufficient evidence to reject the null hypothesis OR evidence suggests the distribution differs from expectations.
Question 2 (4–6 marks)
A researcher performs a chi-square goodness-of-fit test to determine whether a population’s distribution of preferred transport methods (car, bus, walking, cycling) matches a claimed distribution. The p-value obtained is 0.18, using a significance level of 0.05.
(a) State the correct inference decision.
(b) Explain what this decision means in the context of the population.
(c) Comment on whether this decision provides evidence that the claimed distribution is correct.
Question 2
(a)
• 1 mark: Correct decision: fail to reject the null hypothesis OR do not reject H0.
(b)
• 1–2 marks: Explanation that the data do not provide strong enough evidence that the observed preferences differ from the claimed distribution.
(c)
• 1–2 marks: Recognises that failing to reject H0 does not prove the claimed distribution is correct; it only indicates insufficient evidence to conclude that it is wrong.
• 1 mark: Clear statement that random variation could explain the observed differences.
