Understanding the Null Distribution (6.5.2) | AP Statistics Notes

AP Syllabus focus:
‘The null distribution is the theoretical distribution of the test statistic under the assumption that the null hypothesis is true. For tests involving proportions, this is often modeled using a normal distribution (z-distribution).’

Understanding the null distribution is essential because it provides the reference framework for evaluating how unusual an observed test statistic is when the null hypothesis is assumed true.

The Role of the Null Distribution in Inference

In hypothesis testing for a population proportion, the null distribution represents what values of the test statistic would typically occur if the null hypothesis were actually true. Because the goal of a significance test is to measure how incompatible the sample evidence is with the null hypothesis, a clear understanding of this distribution is crucial for interpreting p-values and determining whether the observed data are statistically surprising.

When the null hypothesis specifies a single value for the population proportion, the sampling distribution of the test statistic under that assumption becomes predictable. This predictability is what allows statisticians to quantify evidence and assess whether the observed data deviate meaningfully from what the null hypothesis anticipates.

Why the Null Distribution Is Needed

The null distribution serves as the baseline that determines the probability of observing a test statistic at least as extreme as the one obtained from the sample. Interpreting a test result requires comparing the observed value of the statistic to this theoretical distribution. A proper understanding ensures that the reasoning is rooted in probability rather than subjective judgment.

Because the behavior of sample proportions becomes approximately normal under certain conditions, the null distribution in one-sample z-tests for proportions can be modeled using the standard normal distribution. This modeling allows the use of z-scores and corresponding tail probabilities.

Assumptions and Structure of the Null Distribution

The construction of the null distribution relies on strict adherence to the null hypothesis. For a claim about a population proportion, the null hypothesis specifies a value $p_0$ , which is treated as the true proportion for all probability calculations.

Before using the normal model, conditions relating to independence and normality must be met so that the standardized test statistic behaves consistently with the standard normal distribution. These conditions establish when the theoretical model is appropriate and when conclusions drawn from it are trustworthy.

The Sampling Distribution Under the Null

When the null hypothesis is assumed true, the sampling distribution of the sample proportion is centered at $p_0$ , not at the observed sample proportion.

Sampling distributions of a sample proportion for increasing sample sizes, illustrating how the distribution becomes more symmetric and approximately normal as n grows, consistent with the null distribution model. Source.

EQUATION

$\text{Sampling Distribution of } \hat{p} = N\left(p_0,\ \sqrt{\frac{p_0(1-p_0)}{n}}\right)$
$\hat{p}$ = Sample proportion
$p_0$ = Hypothesized population proportion
$n$ = Sample size

This model captures the expected spread in sample proportions if many samples of the same size were repeatedly drawn under the assumption that the population proportion truly equals $p_0$ .

A standardized test statistic can then be constructed, enabling direct comparison with values from the standard normal distribution.

Standardizing the Test Statistic Under the Null

The test statistic measures how far the observed sample proportion lies from what the null hypothesis predicts, expressed in standardized units. Because the null distribution is assumed to follow a standard normal model, this standardization is vital for locating the observed statistic within that distribution.

EQUATION

$z = \frac{\hat{p} - p_0}{\sqrt{\frac{p_0(1-p_0)}{n}}}$
$z$ = Standardized test statistic
$\hat{p}$ = Observed sample proportion
$p_0$ = Hypothesized population proportion
$n$ = Sample size

This equation forms the bridge between the raw sample data and the theoretical null distribution. It allows the transformation of the sample proportion into a z-score, which is then evaluated relative to the standard normal curve.

Modeling the Null Distribution Using the Standard Normal Curve

Once standardized, the test statistic is compared to the standard normal distribution, often called the z-distribution.

Standard normal curve showing central probability and shaded tail regions beyond ±1.96, representing rare outcomes under the null distribution and illustrating how critical values relate to tail probabilities. Source.

This curve represents the theoretical distribution assumed under the null hypothesis and is symmetrical, centered at zero, and follows known probability rules.

Because of its well-defined structure, the standard normal model makes it possible to compute p-values, which quantify how extreme the observed statistic is within the context of this theoretical distribution.

Key Features of the Null Distribution Modeled as a z-Distribution

Centered at zero because the standardized value reflects differences measured relative to the null hypothesis.
Symmetric, enabling equal treatment of positive and negative deviations.
Tail areas correspond to p-values, allowing inference about the compatibility of the data with the null hypothesis.
Shape determined solely by the normal model, not by the particular data set, ensuring consistency across applications.

Using the Null Distribution to Evaluate Evidence

The null distribution does not describe the data themselves but instead provides the expected distribution of test statistics under the assumption of no effect or no difference. This theoretical framework is essential for interpreting test results because the meaning of “extreme” depends entirely on how the statistic would behave if the null hypothesis were true.

A small tail probability within this distribution indicates that the test statistic lies far from the typical values anticipated under the null hypothesis.

Composite illustration showing the connection between a simulated sampling distribution under the null hypothesis, its theoretical normal approximation, and the standard normal distribution used for computing p-values. Source.

By grounding inference in the null distribution, significance testing maintains a consistent and objective method for evaluating whether observed differences are likely to reflect real effects rather than random sampling variation.

FAQ

The null distribution is constructed under the assumption that the null hypothesis is true, so its centre and spread are determined by the hypothesised proportion, not the observed data.

In contrast, sampling distributions used in estimation (such as those in confidence interval work) are centred at the sample proportion.
This difference reflects the distinct goals of testing versus estimation.

The null distribution represents what would happen in repeated sampling if the null hypothesis were true. It is anchored to the hypothesised proportion and therefore remains fixed.

If the observed proportion is far from this distribution's centre, the resulting test statistic becomes large in magnitude, producing a small p-value.
The shape and centre of the null distribution are unaffected by the sample outcome.

The normal model is unreliable when expected counts under the null hypothesis are too small. In practice, this means that n times the hypothesised proportion and n times its complement should both be at least 10.

If these conditions are not met, the distribution of the test statistic becomes skewed, and the z-approximation is inaccurate.

Alternative methods, such as exact binomial tests, may then be more appropriate.

A larger sample size reduces the standard error, making the null distribution narrower and more concentrated around its centre.

This leads to:
• greater sensitivity to differences from the null hypothesis
• larger test statistics for the same difference in proportions
• smaller p-values when evidence contradicts the hypothesised value

However, the centre of the null distribution remains fixed at the hypothesised proportion.

Simulation allows students and researchers to see how the test statistic behaves under the null hypothesis by repeatedly generating samples that assume the null proportion is true.

This visual and empirical approach helps illustrate:
• how random variation creates the spread of the null distribution
• why unusually large or small statistics correspond to low p-values
• how theoretical approximations (such as the normal model) compare with generated data

Simulations are especially useful when theoretical conditions are borderline or difficult to verify.

Practice Questions

Question 1 (1–3 marks)
A researcher is testing a claim about a population proportion using a one-sample z-test. Explain what is meant by the null distribution of the test statistic and state why it is important when determining a p-value.

Question 1
• 1 mark: Identifies that the null distribution is the theoretical distribution of the test statistic assuming the null hypothesis is true.
• 1 mark: Explains that it shows the values of the statistic that would be expected from random sampling if the null hypothesis were correct.
• 1 mark: States that it is used to determine how likely the observed test statistic is, which is essential for calculating the p-value.

Question 2 (4–6 marks)
A polling organisation claims that 60% of adults support a new policy. A random sample of 400 adults is taken, and a significance test is carried out for the population proportion.

a) Describe the null distribution of the test statistic under the assumption that the organisation’s claim is correct.
b) Explain how this null distribution is used to obtain the p-value in the test.
c) State what a small p-value indicates in the context of the null distribution.

Question 2

a)
• 1 mark: States that the null distribution is the sampling distribution of the test statistic when the null hypothesis (that the true proportion is 0.60) is assumed true.
• 1 mark: Mentions that under suitable conditions it is approximated by a normal distribution centred at the null proportion.

b)
• 1 mark: States that the observed test statistic is compared with this null distribution.
• 1 mark: Explains that the p-value is the probability, under the null distribution, of obtaining a test statistic as extreme as or more extreme than the observed value.

c)
• 1 mark: States that a small p-value indicates the observed test statistic is unlikely under the null distribution.
• 1 mark (contextual): Explains that this provides evidence against the claim that the true proportion is 0.60.

Try All Topic Practice Questions

Written by:

Dr Rahil Sachak-Patwa

Oxford University - PhD Mathematics

Rahil spent ten years working as private tutor, teaching students for GCSEs, A-Levels, and university admissions. During his PhD he published papers on modelling infectious disease epidemics and was a tutor to undergraduate and masters students for mathematics courses.