TutorChase logo
Login
AP Statistics study notes

5.5.1 Determining Parameters of Sampling Distributions

AP Syllabus focus:
‘For independent samples of a categorical variable from a population with population proportion p, the sampling distribution of the sample proportion (p-hat) has a mean (mean of p-hat = p) and a standard deviation (sigma of p-hat = sqrt[(p(1-p))/n]), assuming sampling with replacement. For sampling without replacement, the standard deviation of the sample proportion is smaller than this formula suggests, but if the sample size is less than 10% of the population size, the difference is considered negligible.’

The behavior of a sample proportion reveals how population characteristics manifest across repeated samples, helping students understand why sample results vary and how that variation can be modeled predictably.

Determining Parameters of Sampling Distributions for Sample Proportions

Understanding the parameters of the sampling distribution of a sample proportion is essential because these parameters describe how the statistic behaves across all possible random samples of the same size. For AP Statistics, the two key parameters are the mean of the sampling distribution and its standard deviation, each playing a different but complementary role in characterizing the distribution.

The Role of the Sample Proportion

A sample proportion, denoted p^\hat{p}, represents the proportion of individuals in a sample who exhibit a particular categorical characteristic. It serves as an estimate of a population proportion, denoted pp, which measures the true proportion in the entire population. Because samples differ, sample proportions will vary from one sample to another, producing a predictable distribution when samples are independent and identically sized.

Sample Proportion (p^\hat{p}): The proportion of individuals in a sample who fall into a specified category, calculated as the number of “successes” divided by the sample size.

This sampling variation leads to a distribution whose center and spread reveal important information about how closely sample results cluster around the population truth.

Pasted image

This histogram shows a simulated sampling distribution of the sample proportion centered near the population proportion ppp. The bell-shaped form reflects typical sampling variability when independence and sample-size conditions are met. Although the original source introduces normal approximation, here it visually supports understanding of the mean and standard deviation of the sampling distribution. Source.

Mean of the Sampling Distribution

The mean of the sampling distribution of p^\hat{p} gives the long-run average of sample proportions over many random samples. According to the specification, this mean equals the true population proportion pp, making the sample proportion an unbiased estimator of the population proportion.

EQUATION

μp^=p \mu_{\hat{p}} = p
p p = Population proportion (unitless)

This property ensures that, over repeated sampling, the average of all sample proportions will match the true parameter. This unbiasedness is a cornerstone of inferential methods that rely on sample proportions.

Standard Deviation of the Sampling Distribution

The standard deviation of p^\hat{p} describes how much sample proportions vary from sample to sample. The syllabus specifies that when sampling with replacement, or when sampling without replacement from a large population, the standard deviation depends on the population proportion and the sample size.

EQUATION

σp^=p(1p)n \sigma_{\hat{p}} = \sqrt{\frac{p(1-p)}{n}}
p p = Population proportion
n n = Sample size (number of observations)

Because proportions closer to 0.5 generate more variability and larger samples reduce variability, the formula reflects two intuitive principles:

  • Populations with more balanced outcomes lead to more uncertain sample results.

  • Larger samples stabilize estimates by averaging across more observations.

The 10% Condition and Sampling Without Replacement

While the standard deviation formula assumes sampling with replacement, real-world data collection often involves sampling without replacement. In these cases, observations are not perfectly independent, and the standard deviation should be slightly smaller. However, the specification states that this difference is negligible when the sample size is less than 10% of the total population size.

This condition supports the continued use of the same standard deviation formula in practical settings such as surveys, experiments, and observational studies where the population is large relative to the sample.

Ensuring Independence and Appropriate Use

For the formulas and interpretations to hold, the samples must meet the condition of independence. This arises naturally when:

  • Sampling is conducted with replacement, or

  • Sampling is without replacement but the 10% condition is satisfied.

Independence ensures that the behavior of one sampled individual does not meaningfully influence another, preserving the theoretical properties of the sampling distribution.

Why These Parameters Matter

Knowing the mean and standard deviation of the sampling distribution of p^\hat{p} enables students to:

  • Predict how much variation to expect in sample results

  • Assess how typical or unusual certain sample proportions are

  • Understand why different samples produce different estimates even when drawn from the same population

  • Prepare for later topics involving normal approximations, confidence intervals, and hypothesis testing

The parameters articulate both the expected value of a sample proportion and the level of uncertainty associated with sampling. These insights form the backbone of statistical reasoning involving categorical data.

Key Takeaways for AP Statistics

  • The sample proportion is an unbiased estimator of the population proportion.

  • The variability of p^\hat{p} decreases as sample size increases.

Pasted image

This figure displays sampling distributions of p^\hat pp^​ for increasing sample sizes nnn, illustrating the decreasing spread predicted by the formula for σp^\sigma_{\hat p}σp^​​. As nnn grows, the distributions become narrower and more concentrated around the true proportion. The figure also shows the shape becoming more nearly normal, which is slightly beyond the scope of this subsubtopic but helpful for conceptual understanding. Source.

  • The standard deviation formula applies under sampling with replacement or when the sample is less than 10% of the population.

  • These parameters guide how we model and interpret sample-based estimates in real data settings.

FAQ

When p is extremely small or extremely large, the sampling distribution becomes noticeably skewed, especially for modest sample sizes. This is because most samples will contain very few or very many “successes”, reducing the range of plausible sample proportions.

To stabilise the distribution, a much larger sample size is usually required.
• When p is near 0 or 1, the product p(1 − p) becomes small, reducing the standard deviation.
• Despite this, skewness may remain until the sample size is sufficiently large for sample proportions to vary meaningfully.

Increasing the sample size does more than just narrow the standard deviation; it also increases the number of possible values the sample proportion can take. With a small n, the statistic can jump only in large increments (for example, 0.10, 0.15, 0.20). With a larger n, these steps become much smaller.

A larger sample therefore produces
• a tighter clustering around the population proportion
• more finely spaced potential values, making the distribution smoother and more continuous in appearance

The centre of the sampling distribution depends on the expectation of the sample proportion, not the shape of the underlying population. As long as samples are random and each individual has the correct probability of being a “success”, the long-run average of sample proportions will remain equal to p.

This result holds regardless of population skewness because the definition of a proportion inherently relies on counting binary outcomes rather than measuring a numerical variable.

Real-world sampling rarely behaves perfectly. Several issues can inflate or deflate variability relative to the theoretical model:

• Non-random sampling can introduce dependencies between observations.
• Response bias or undercoverage can distort the proportion of observed “successes”.
• Clustering (for example, sampling households instead of individuals) reduces independence.
• Very small populations or sampling more than 10% of the population alters the standard deviation.

These effects typically move the observed variability away from the theoretical value p(1 − p)/n.

Independence ensures that each observation contributes unique information. Without it, the variability of the sample proportion does not follow the expected formula and may be either inflated or suppressed.

When observations are dependent,
• success outcomes may cluster more tightly or more loosely than assumed
• the binomial model underlying the sampling distribution breaks down
• the sampling distribution may not be predictable enough for inference

Ensuring independence—through randomisation or the 10% condition—preserves the validity of the theoretical parameters.

Practice Questions

Question 1 (1–3 marks)
A large population contains a true proportion p = 0.42 of households with at least one pet cat. A researcher selects a simple random sample of 80 households and records the sample proportion of households with at least one cat.
(a) State the mean of the sampling distribution of the sample proportion.
(b) State the standard deviation of the sampling distribution of the sample proportion, assuming sampling with replacement.
(c) Explain why it is acceptable to use the standard deviation formula for sampling with replacement even if the researcher actually samples without replacement.

Question 1

(a) 1 mark
• Correctly states that the mean of the sampling distribution is p = 0.42.

(b) 1 mark
• Correctly states the standard deviation as sqrt[p(1 - p) / n] = sqrt[0.42 × 0.58 / 80].

(c) 1 mark
• States that the formula is still appropriate because the sample size is less than 10% of the population, so the difference from sampling without replacement is negligible.

Total: 3 marks

Question 2 (4–6 marks)
A charity claims that 30% of adults regularly volunteer. A student plans to take a simple random sample of 200 adults to estimate the proportion who volunteer.
(a) Identify the parameter and the statistic in this context.
(b) Determine the mean and standard deviation of the sampling distribution of the sample proportion.
(c) Explain how the sample size affects the variability of the sampling distribution.
(d) The student suggests that the standard deviation formula might not apply because the sampling will be done without replacement. Discuss whether this concern is valid.

Question 2

(a) 1 mark
• Parameter: the true proportion of adults who regularly volunteer.
• Statistic: the sample proportion of adults who volunteer in the student’s sample.
(1 mark for identifying both correctly)

(b) 2 marks
• 1 mark: States mean of the sampling distribution is p = 0.30.
• 1 mark: States standard deviation as sqrt[p(1 - p) / n] = sqrt[0.30 × 0.70 / 200].

(c) 1 mark
• Explains that larger sample sizes reduce the variability of the sampling distribution, making sample proportions more tightly clustered around p.

(d) 1–2 marks
• 1 mark: Notes that sampling without replacement slightly reduces variability.
• 1 mark: Explains that the effect is negligible if the sample size is less than 10% of the population, so the usual standard deviation formula is acceptable.

Total: 5–6 marks

Hire a tutor

Please fill out the form and we'll find a tutor for you.

1/2
Your details
Alternatively contact us via
WhatsApp, Phone Call, or Email