TutorChase logo
Login
AP Statistics study notes

5.5.2 Normal Approximation of Sampling Distributions

AP Syllabus focus:
‘The sampling distribution of the sample proportion (p-hat) for a categorical variable will be approximately normally distributed if the sample size is large enough, specifically if np ≥ 10 and n(1-p) ≥ 10. This criteria ensures that the normal approximation is valid for calculating probabilities and making inferences.’

A large enough sample size allows the sampling distribution of a sample proportion to behave predictably, letting statisticians use the normal distribution to estimate important probability statements.

Normal Approximation in Context

The normal approximation for a sampling distribution of a sample proportion is a powerful tool that simplifies probability calculations when dealing with categorical data. The key idea is that, under appropriate conditions, the distribution of sample proportion (p^\hat{p}p^​) becomes approximately normal, even though the underlying variable is categorical rather than numerical. This approximation allows us to use familiar tools such as the standard normal curve, zz-scores, and probability tables or technology.

Before using this approximation, students must understand that a sample proportion is a statistic that estimates a population proportion. Because each sample produces a potentially different value of p^\hat{p}p^​, repeated sampling creates a distribution of possible sample proportions. When the sample size is large enough, this distribution takes on an approximately bell-shaped form.

Pasted image

A histogram of the sampling distribution of the sample proportion with a smooth normal curve superimposed, illustrating how repeated samples form an approximately normal pattern when conditions are met. Numerical labels shown represent just one simulation example and exceed syllabus requirements. Source.

Conditions for Normal Approximation

A central requirement for using the normal approximation comes directly from the syllabus specification: the approximation is appropriate only when both np10np \ge 10 and n(1p)10n(1 - p) \ge 10. These criteria ensure that the sampling distribution has enough observations in both the “success” and “failure” categories to avoid skewness or distortion.

Why These Conditions Matter

  • The condition np10np \ge 10 ensures the expected number of successes is sufficiently large.

  • The condition n(1p)10n(1-p) \ge 10 ensures the expected number of failures is also sufficiently large.

  • Together, they guarantee that repeated samples will produce proportions centered around the true population proportion with variability that behaves predictably.

These criteria apply specifically to categorical variables, where outcomes can be classified into two categories—often called “successes” and “failures,” but applicable to any binary classification. When these conditions are met, the sampling distribution becomes symmetric enough for a normal curve to model it accurately.

Structure of the Sampling Distribution Under Normal Approximation

Even though students should not perform calculations in these notes, they should understand the components that shape the approximated distribution. Under the conditions stated above, the sampling distribution of p^\hat{p}p^​ is centered at the population proportion and has a predictable spread.

EQUATION

Mean(p^)=p Mean(\hat{p}) = p
p p = Population proportion (unitless)

SD(p^)=p(1p)n SD(\hat{p}) = \sqrt{\dfrac{p(1-p)}{n}}
n n = Sample size (number of observations)

These expressions define the location and spread of the approximated normal curve. Once the distribution is approximated as normal, probability statements about the value of p^\hat{p}p^​ can be made by converting sample proportions to zz-scores and finding associated areas under the standard normal curve.

A normal sentence is placed here to maintain proper formatting before the next block.

Understanding the Behavior of Sample Proportions

The sampling distribution of a sample proportion reflects the idea that statistics vary from sample to sample. Even when all samples come from the same population, randomness introduces natural variation. When the conditions are met and the normal approximation applies, this variation becomes structured enough to analyze using standard probability tools.

Key characteristics under normal approximation

  • Center: The distribution is centered at the population proportion, meaning p^\hat{p}p^​ is an unbiased estimator.

  • Spread: Larger samples reduce the standard deviation, making the sampling distribution tighter around the true value.

Pasted image

Three sampling distributions of the sample proportion for increasing sample sizes, each approximately normal. As sample size grows, the distributions narrow, demonstrating the decreasing standard deviation of p^\hat{p}p^​. Exact numerical labels are example-based and extend beyond syllabus requirements. Source.

  • Shape: When the success–failure conditions are satisfied, the distribution becomes approximately bell-shaped, allowing the use of normal techniques.

These features support valid statistical reasoning about proportions, enabling students to judge how unusual or typical a sample result may be.

Using the Normal Approximation Responsibly

Although using the normal distribution greatly simplifies inference, it must be applied only when justified. Misapplication can lead to inaccurate probability statements or incorrect interpretations.

Essential safeguards

  • Always verify the success–failure conditions before using the normal model.

  • Recognize that skewed or small populations can distort the distribution when conditions are not met.

  • Understand that the approximation improves as the sample size increases; borderline cases may still show slight asymmetry.

  • Keep in mind that this approximation describes the distribution of the statistic, not the distribution of individual observations.

Interpretive Implications

Using the normal approximation allows students to translate questions about sample proportions into questions about areas under the normal curve. This makes it possible to evaluate how likely a sample proportion is, given the true population proportion, and to make inferential judgments about whether observed sample results are consistent with expectations.

Interpretation requires attention to context, units, and the connection between sample-level statistics and population-level parameters. When conditions permit, the normal approximation provides a reliable, intuitive framework for studying sampling variability in categorical settings.

FAQ

The conditions are guidelines rather than absolute rules, but relaxing them increases the risk of inaccurate probability estimates.

In practice, statisticians may accept slightly lower values if the sample proportion is not extremely close to 0 or 1, as the distribution may still be roughly symmetric.
However, for AP-level work, you should apply the 10-success, 10-failure rule consistently, as it provides reliable justification without needing advanced reasoning.

Larger samples reduce the influence of random fluctuations in individual outcomes, making the distribution of sample proportions cluster more tightly around the true population proportion.

This clustering produces a shape much closer to a bell curve, even when the underlying categorical process is inherently non-normal.

The greater the sample size, the smaller the impact of any single observation, which smooths out irregularities.

Yes, but only if the sample size is extremely large.

When the population proportion is close to 0 or 1, the distribution of outcomes is naturally skewed.
A sufficiently large sample size compensates by ensuring enough expected successes and failures for approximate symmetry.

If the required sample size becomes impractically large, the normal approximation may no longer be appropriate.

The approximation allows you to convert a sample proportion into a measure of how far it lies from the expected value.

This helps determine whether the observed proportion is plausible given the population proportion or whether it suggests meaningful deviation.

Using a bell-shaped model also makes it easier to compare different samples on a common statistical scale.

Scenarios are unsuitable when outcomes are extremely rare or extremely common, meaning the sample proportion is close to 0 or 1.

They are also unsuitable when the available sample size is small, or when the population is too tiny for meaningful random sampling.

Situations involving highly clustered outcomes or strong dependence between observations further reduce the reliability of the approximation.

Practice Questions

Question 1 (1–3 marks)
A large school reports that 62% of its students regularly complete their homework. A random sample of 80 students is selected.
State whether the sampling distribution of the sample proportion of students who regularly complete their homework can be approximated by a normal distribution. Justify your answer using the required conditions.

Question 1 (1–3 marks)
• 1 mark: States that the normal approximation is appropriate or not appropriate.
• 1 mark: Checks the condition np >= 10 (80 × 0.62 = 49.6, which satisfies the condition).
• 1 mark: Checks the condition n(1 – p) >= 10 (80 × 0.38 = 30.4, which also satisfies the condition).
Full marks require both conditions to be verified and a clear conclusion.

Question 2 (4–6 marks)
A charity estimates that 48% of local residents support a new community initiative. To assess this estimate, researchers take a random sample of 200 residents and record the sample proportion who support the initiative.

a) Explain why it is appropriate to use a normal approximation for the sampling distribution of the sample proportion in this context.
b) Describe the centre (mean) and spread (standard deviation) of this sampling distribution.
c) Comment on how the shape of the sampling distribution would change if the sample size were reduced to 40 residents.

Question 2 (4–6 marks)

a) (2 marks)
• 1 mark: States that the conditions for normal approximation are met.
• 1 mark: Correctly checks both conditions:

  • np = 200 × 0.48 = 96 >= 10

  • n(1 – p) = 200 × 0.52 = 104 >= 10

b) (2 marks)
• 1 mark: States the mean of the sampling distribution is equal to the population proportion (0.48).
• 1 mark: States the standard deviation formula or value:
standard deviation = sqrt[p(1 – p)/n] = sqrt[0.48 × 0.52 / 200].

c) (2 marks)
• 1 mark: States that decreasing the sample size increases variability (wider spread).
• 1 mark: States that with n = 40, the success–failure conditions may no longer be met, meaning the distribution may become less symmetric and less normal in shape.

Total: 6 marks

Hire a tutor

Please fill out the form and we'll find a tutor for you.

1/2
Your details
Alternatively contact us via
WhatsApp, Phone Call, or Email