AP Syllabus focus:
‘Detailed explanation on how to check for independence in data collection methods: a. Ensuring data is collected through random sampling or randomized experiments. b. For sampling without replacement, confirming that the sample size is no more than 10% of the population to mitigate the dependence between samples.’
Independence checks ensure that sample observations do not meaningfully influence each other, protecting the validity of a one-sample z-interval for a population proportion and supporting reliable inference.
Checking for Independence
Establishing independence is essential because inference procedures for proportions assume that each observation in a sample behaves independently of the others. When this assumption holds, the sampling distribution of the sample proportion is more predictable and aligns with the theoretical properties required for constructing confidence intervals.
Why Independence Matters
Statistical methods such as confidence intervals rely on the idea that the outcomes we observe are not systematically connected. If observations influence one another, the sample proportion may not accurately reflect the population proportion. This would undermine the validity of the method and create misleading estimates.
Random Sampling and Randomized Experiments
The syllabus highlights that independence is primarily justified through random sampling or randomized experiments, both of which limit systematic bias and dependence.
Random Sampling: A data collection method in which every member of the population has an equal chance of being selected, helping ensure that observations are independent.
Random sampling strengthens independence because the selection of one individual does not alter the probability of selecting another.

A visual representation of simple random sampling in which each individual has an equal chance of selection, reinforcing the independence assumption for inference. The variety of colors represents different types of individuals but adds no extra statistical content beyond illustrating population diversity. Source.
This process supports a representative sample and aligns with the assumptions required for inference.
Randomized Experiment: A study design in which participants are randomly assigned to conditions or treatments, ensuring that group membership does not depend on any preexisting characteristics.
Randomized experiments enforce independence across treatment groups by eliminating systematic differences with respect to assignment.

A structured flowchart illustrating the phases of a randomized controlled trial, highlighting the random assignment step that ensures independent treatment groups. Additional elements such as follow-up and analysis reflect real experimental designs but extend slightly beyond AP requirements. Source.
The Role of the 10% Condition
When sampling without replacement, selecting one individual slightly changes the composition of the remaining population. This introduces dependence, but when the sample is small relative to the population, the effect is negligible.
The syllabus emphasizes confirming that the sample size is no more than 10% of the population, often called the 10% condition. This guideline ensures that the dependence introduced through sampling without replacement is too minor to compromise the assumptions of the method.
Although the 10% condition is not itself a formula, it acts as a practical rule to check whether the independence assumption remains reasonable in real-world sampling contexts.
Applying the Independence Checks
In practice, determining independence requires attention to how the data were collected rather than focusing on numerical computations. The following points outline how the student should evaluate independence:
Confirm random sampling:
Was the sample drawn using a method that gives all individuals an equal chance of selection?
Does the sampling plan avoid systematic inclusion or exclusion?
Verify randomized assignment when relevant:
If the study compares groups, were participants randomly assigned to conditions?
Does the design prevent confounding by balancing unobserved characteristics?
Evaluate sampling without replacement:
If the population is finite and sampling occurs without replacement, check the size relationship between sample and population.
Ensure to maintain approximate independence.
Even when data arise from well-planned studies, independence must be documented explicitly, as inference procedures require justification grounded in study design rather than assumptions.
Independence in Context
The independence condition does not rely on the value of the sample proportion or uncertainty measures. Instead, it depends solely on how the sample was obtained. This distinguishes independence checks from normality checks and margin-of-error calculations, which rely on sample values.
Independence also situates the confidence interval in its proper context by linking the quality of the estimate to the integrity of the underlying data. Without independent observations, even large samples may produce distorted sample proportions.
Clarifying Common Misunderstandings
Students often mistake independence for a property of numerical results, but it is a property of design. A sample proportion that “looks reasonable” is not evidence of independence; only random sampling or randomized assignment can support that assumption.
Another frequent misunderstanding is believing that large sample sizes automatically ensure independence. In fact, large nonrandom samples can magnify systematic bias, making independence even less plausible.
Relationship to the Broader Inference Framework
Within the framework of constructing a one-sample z-interval for a population proportion, independence serves as the foundational condition before any mathematical procedure can be applied. While later steps involve checking normality and calculating the standard error, these steps depend on independence being satisfied first.
Independence ensures the sampling distribution of the sample proportion behaves consistently with theoretical expectations. All subsequent inference procedures rely on this stability.
Summary of the Required Checks
For clarity, the independence checks required by the syllabus can be expressed as:
Random sampling or randomized experiment must be used.
If sampling without replacement, the sample must represent no more than 10% of the population to minimize dependence.
Independence must be evaluated based on study design, not statistical output.
Through careful attention to these conditions, students can confidently justify the independence assumption necessary for constructing valid confidence intervals for population proportions.
FAQ
Independence becomes harder to justify because individuals in small or highly interconnected populations may influence one another’s responses. Even with random sampling, shared environments or relationships can introduce dependence.
To maintain independence in such situations, ensure:
• The sample size is well below 10% of the population.
• The sampling frame includes diverse subgroups, reducing correlated responses.
If these conditions cannot be met, alternative methods such as stratified sampling may provide more independent observations.
When the population is extremely large, such as all users of a website, the 10% condition is effectively always satisfied because the sample represents a tiny fraction of the population.
If the population size is unknown, independence is usually assumed as long as the sample was selected randomly and from a broad, well-defined source. In such cases, exam responses should explicitly state that the sample is small relative to the presumed population.
Sometimes researchers use systematic sampling, such as selecting every tenth visitor, as a practical approximation of random sampling. Independence may still be reasonable if the ordering of individuals is not related to the variable of interest.
However, independence fails if the pattern of selection aligns with predictable cycles, such as time-based behaviours or grouped populations. When in doubt, justify independence by showing that the shortcut does not create a systematic pattern.
Clustering occurs when sampled individuals come from naturally formed groups, such as classrooms, neighbourhoods, or work teams. Individuals within clusters often share characteristics, causing responses to be more similar than those between clusters.
This violates independence by creating dependence within clusters. To address this:
• Use sampling methods that spread selections across many clusters.
• Consider cluster sampling only when you can treat clusters as units rather than individuals.
Randomisation eliminates systematic differences between treatment groups, ensuring that group membership is unrelated to pre-existing characteristics. This form of independence is stronger because it applies even when the participants do not form a representative sample of a wider population.
Random sampling supports independence by preventing selection bias, but randomisation additionally removes confounding when comparing groups, making it more robust in experimental designs.
Practice Questions
Question 1 (1–3 marks)
A school wants to estimate the proportion of students who regularly revise for mathematics. They select a simple random sample of 80 students from the school population of 1,200.
Explain whether the independence condition for constructing a confidence interval for a population proportion is satisfied.
Question 1
• 1 mark: States that the sample was selected using simple random sampling, supporting independence.
• 1 mark: Mentions that because sampling is without replacement, the 10% condition should be checked.
• 1 mark: Correctly notes that 80 is less than 10% of 1,200, so independence is reasonably satisfied.
Total: 3 marks
Question 2 (4–6 marks)
A researcher wishes to estimate the proportion of local residents who support building a new community centre. The researcher distributes a survey to 200 randomly selected households from a town containing 3,000 households.
(a) Explain how the method of data collection helps justify the independence of observations.
(b) The researcher later realises that the surveys were collected by approaching households along a single street rather than across the whole town. Discuss how this affects the independence assumption and the validity of using a confidence interval for the population proportion.
Question 2
(a)
• 1 mark: Identifies that households were chosen using random sampling.
• 1 mark: Explains that random sampling helps ensure each household has an equal chance of selection.
• 1 mark: States that this reduces dependence between observations, supporting the independence assumption.
• 1 mark: Notes that 200 is less than 10% of 3,000, making independence plausible when sampling without replacement.
(b)
• 1 mark: Identifies that sampling from a single street is not truly random or representative.
• 1 mark: Explains that households close to one another may be more similar, creating dependence between observations.
• 1 mark: States that the lack of independence weakens the validity of constructing a confidence interval for the population proportion.
Total: 6 marks
