TutorChase logo
Login
AP Statistics study notes

6.2.2 Preliminary Checks

AP Syllabus focus:
‘Discuss the importance of preliminary checks for independence (random sampling or randomized experiment and the 10% condition) and sample size adequacy (np-hat and n(1-p-hat) are at least 10) to justify the use of the z-interval method.’

Preliminary checks ensure that a one-sample z-interval for a population proportion is statistically valid. These conditions protect against misleading inferences by confirming independence and adequate sample size.

Importance of Preliminary Checks

Before constructing a confidence interval for a population proportion, AP Statistics requires students to verify that essential assumptions are satisfied. These checks ensure that statistical conclusions rest on secure theoretical foundations. The two major requirements—independence and sample size adequacy—determine whether the sampling distribution of the sample proportion, p-hat, can be treated as approximately normal. The z-interval method depends on this normality assumption, so these conditions must be met to proceed responsibly.

Independence Condition

The independence requirement ensures that the observations in the sample do not influence each other. Without independence, the variability of the sample proportion may be underestimated or overestimated, leading to faulty interval estimates.

Random Sampling or Randomized Experiment

The first component of the independence check is confirming that data arise from a random sample or a randomized experiment, both of which naturally promote independence among observations. Randomization gives each member of a population or each experimental unit an equal chance of selection.

Random Sample: A sample selected so that every member of the population has an equal chance of being chosen.

Random selection or assignment helps minimize systematic bias and strengthens the justification for using probability-based inference methods such as the z-interval. It also aligns with the AP Statistics emphasis on data collection integrity.

Pasted image

This diagram shows how a simple random sample is selected from a larger population, reinforcing that random selection supports the independence condition for valid z-interval inference. Source.

The 10% Condition

When sampling is performed without replacement, the 10% condition is essential to maintain approximate independence among sample observations. This condition requires that the sample size be no more than 10% of the population size.

10% Condition: A requirement stating that when sampling without replacement, the sample size must be at most 10% of the population to reasonably assume independence.

This rule guards against excessive dependence between observations that occurs when large samples significantly deplete the population. After establishing this, the sampling structure can be considered close enough to independent for inference procedures.

Pasted image

This visual shows the relationship between population and sample, illustrating how sampling without replacement requires limits like the 10% condition to maintain approximate independence for inference. Source.

A normal sentence ensures continuity before introducing additional terminology in later sections.

Sample Size Adequacy

Once independence is verified, AP Statistics requires checking whether the sample size is large enough for the sampling distribution of p-hat to be modeled by a normal distribution. This supports the use of z-critical values when creating confidence intervals.

Success–Failure Condition

The success–failure condition ensures that the distribution of the sample proportion is approximately normal. It requires that both the expected number of successes and the expected number of failures in the sample meet minimum thresholds.

Success–Failure Condition: A rule stating that the values np^n\hat{p} and n(1p^)n(1-\hat{p}) must each be at least 10 to justify the normal approximation for the sampling distribution of the sample proportion.

This requirement reflects the idea that binomial distributions with sufficiently large expected counts become nearly symmetric, allowing normal approximations. If either expected count is too small, the sampling distribution of p-hat becomes skewed, making the z-interval inappropriate. The AP Statistics curriculum explicitly emphasizes this threshold to maintain reliable inference.

A normal sentence bridges this explanation to the explicit mathematical structure of the condition, which is central to z-interval validity.

EQUATION

Expected successes=np^ \text{Expected successes} = n\hat{p}
n n = Sample size
p^ \hat{p} = Sample proportion

Expected failures=n(1p^) \text{Expected failures} = n(1-\hat{p})
1p^ 1-\hat{p} = Complement of sample proportion

These expressions quantify whether the sample contains enough information to support a normal model. Both must reach at least 10 to proceed with the z-interval for a population proportion. This requirement is directly tied to the reliability of critical values, margin of error, and the overall confidence interval estimate.

Why These Checks Matter in Inference

Preliminary checks justify the use of the z-interval by validating its assumptions. When independence and adequate sample size are confirmed, the sampling distribution of p-hat behaves predictably. This predictability allows statisticians to apply theoretical tools from the normal distribution to estimate a population proportion with a specified level of confidence.

Connection to the Sampling Distribution

The z-interval formula depends on the assumption that the sampling distribution of p-hat is approximately normal with a calculable standard error. If preliminary checks are violated, the distribution may not approximate normality, making the critical value and resulting confidence interval inaccurate.

Ensuring Reliable Inference

These checks align with the AP Statistics goal of fostering responsible use of inferential methods. They prevent misuse of z-intervals in cases where the data structure does not support the required assumptions. Independence ensures unbiased sampling behavior, and adequate sample size provides the symmetry and shape needed for normal-based inference.

FAQ

The 10% condition is a practical guideline rather than a mathematical necessity. Its purpose is to ensure approximate independence when sampling without replacement.

In extremely large populations, where the sample comprises only a tiny fraction of all individuals, the condition is naturally satisfied even if the exact population size is unknown.

It may also be relaxed slightly in cases where small dependencies between observations have negligible effect on variability, though this should be justified and approached cautiously.

In many real studies, the exact population size is unknown. The key is assessing whether the sample is plausibly small relative to the total population.

You may rely on contextual reasoning, such as whether the population is known to be in the thousands or millions.
If the sample is extremely large relative to the likely population, you should question the independence assumption and consider alternative methods.

For a confidence interval, the success–failure check always uses the observed sample proportion, because the purpose is to assess normality of the sampling distribution under the data actually collected.

Using an assumed or historical proportion could produce misleading conclusions if the true rate differs.
The sample proportion reflects the realised balance of successes and failures, making it the appropriate quantity for this diagnostic.

A large sample cannot compensate for a fundamentally flawed sampling design. If data are not independent, variability estimates may be biased, making the z-interval unreliable regardless of sample size.

Independence ensures each observation contributes unique information. Without it, the interval may appear narrower than it should, masking uncertainty.

Sample size adequacy matters only after independence is established.

The researcher should first consider whether the shortfall is minor or substantial. If one count is only slightly below 10, normal approximation may still be reasonable, but this should be acknowledged.

Possible approaches include:
• increasing the sample size
• using an adjusted or alternative method (such as a simulation-based interval)
• reporting results with caution, noting potential skewness

The key is recognising that borderline cases require justification rather than automatic acceptance.

Practice Questions

Question 1 (1–3 marks)
A researcher selects a random sample of 80 students from a large university to estimate the proportion who prefer online lectures. Explain whether the independence condition is satisfied for constructing a z-interval for a population proportion.

Question 1 (1–3 marks)

• 1 mark: States that random sampling supports independence.
• 1 mark: Mentions that the university population is large, so sampling 80 students is unlikely to violate independence.
• 1 mark: Notes the 10% condition (sample must be less than 10% of the population) and that it is satisfied here.

Full marks require reference to random sampling and the 10% condition.

Question 2 (4–6 marks)
A wildlife organisation wants to estimate the proportion of birds in a large reserve that carry a particular tag. They plan to capture and inspect a sample of birds.

(a) Explain why it is necessary to check the independence condition before constructing a confidence interval for the population proportion.
(b) The organisation expects that roughly 12% of birds carry the tag. They plan to sample 150 birds. Determine whether the sample size adequacy condition is met for using a z-interval. Show your reasoning.
(c) Based on your checks in parts (a) and (b), comment on whether the z-interval is appropriate for this study.

Question 2 (4–6 marks)

(a) (2 marks)
• 1 mark: States that independence ensures observations do not influence each other.
• 1 mark: Explains that independence is required for the sampling distribution of the sample proportion to be modelled as approximately normal, allowing use of the z-interval.

(b) (2 marks)
• 1 mark: Calculates expected successes: 150 × 0.12 = 18 (≥ 10).
• 1 mark: Calculates expected failures: 150 × 0.88 = 132 (≥ 10).
Both conditions met.

(c) (2 marks)
• 1 mark: States that the independence condition is likely satisfied if the birds are randomly captured and the sample is less than 10% of the population.
• 1 mark: Concludes that, because both independence and sample size adequacy conditions are satisfied, the z-interval is appropriate.

Full marks require correct checks and an overall justification.

Hire a tutor

Please fill out the form and we'll find a tutor for you.

1/2
Your details
Alternatively contact us via
WhatsApp, Phone Call, or Email