AP Syllabus focus:
‘Check for independence with two independent, random samples or a randomized experiment. For sampling without replacement, ensure n1/N1 ≤ 10% and n2/N2 ≤ 10%. The sampling distribution of the difference in sample means should be approximately normal. If distributions are skewed or sample sizes are less than 30, use caution when applying this procedure.’
Verifying conditions ensures that a two-sample confidence interval for a difference in population means is statistically sound, producing trustworthy estimates grounded in appropriate sampling and distributional assumptions.
Independence Conditions
Establishing independence is essential because confidence interval formulas rely on the assumption that observed values do not influence one another.
Independent Random Samples
Two samples must be drawn independently. Independence is typically justified when researchers use simple random sampling or a randomized experiment, where each observation arises without affecting any other. Random sampling minimizes systematic bias and supports valid inference.

This diagram illustrates how a simple random sample is selected from a larger population, supporting the condition that observations be chosen independently. It reinforces the population–sample structure underlying independence assumptions required for valid inference. Although it does not explicitly reference the 10% condition, it clarifies the conceptual relationship between samples and populations. Source.
Independence: A condition in which individual observations do not influence one another, ensuring unbiased estimators and valid probability calculations.
When sampling is conducted without replacement, an additional numerical guideline applies. The sample must be sufficiently small relative to the population to maintain approximate independence.
10% Condition
To meet the 10% condition, confirm that each sample size is no more than 10% of its respective population size. This maintains near-independence in the absence of replacement.
• Check that
• Check that
These checks help prevent the dependencies that arise when sampling consumes a substantial portion of the total population. Although simple, the rule is crucial when populations are finite and not extremely large.
Independence in Experimental Settings
In experiments, independence is typically established through random assignment rather than random sampling. Random assignment balances potential confounding variables, creating two groups that behave as if randomly sampled from the same population. This permits inference about cause-and-effect relationships within the experimental context.
Normality Conditions
A two-sample t-interval for the difference of means relies on the assumption that the sampling distribution of the difference in sample means is approximately normal.
Approximately Normal Sampling Distribution
The central requirement is that the statistic follows an approximately normal distribution. This assumption supports using t-distribution critical values to construct the interval.
Sampling Distribution of the Difference in Sample Means: The probability distribution of produced by repeatedly sampling from both populations under identical conditions.
In practice, students rarely observe the true sampling distribution. Instead, they justify normality using sample size and sample shape.
Sample Size Considerations
Larger samples produce more reliable approximations of normality because of the Central Limit Theorem, which ensures that the distribution of sample means becomes approximately normal as sample size increases. When both sample sizes exceed 30, normality is typically assumed even if population distributions are somewhat skewed.

This figure demonstrates how sampling distributions of sample means become more bell-shaped as sample size increases, illustrating the Central Limit Theorem. It reinforces the concept that larger samples yield more reliable normal approximations. Although the figure focuses on one-sample means, the same principle underpins normality assumptions for two-sample procedures. Source.
• If both samples have , the t-interval procedure is generally robust.
• If either sample is small, rely on shape inspection.
Evaluating Skewness and Outliers
When sample sizes are under 30, students must check whether either sample displays strong skewness, heavy tails, or outliers. These features can distort the sampling distribution, making t-based inference less reliable.

This panel of histograms contrasts varying degrees of left skew, symmetry, and right skew, illustrating the importance of assessing distribution shape when sample sizes are small. Recognizing strong versus mild skewness helps determine whether normality conditions are satisfied for t-based inference. The figure includes several variations of skewness, offering nuance beyond the minimum required by the syllabus. Source.
Skewness: An asymmetry in a distribution where values trail off more heavily to one side, potentially influencing mean-based inference.
If samples show only mild skewness and no extreme outliers, the procedure may still be appropriate. However, caution is required when distributions exhibit irregular shapes, as these may jeopardize the normality assumption.
A brief sentence connecting these ideas
Because the validity of the t-interval relies on the normal approximation, confirming distributional suitability is a vital part of the verification process.
Combined Normality Requirement
Both samples must individually satisfy normality conditions. If one sample meets the condition but the other does not, the procedure is weakened. The interval’s reliability depends on the combined behavior of both distributions because the statistic incorporates variation from each group.
• Verify sample distribution shape for each group independently
• Confirm sample size sufficiency for each group independently
• Use caution when either distribution exhibits concerning features
Coordinating All Conditions
A confidence interval for the difference of means is appropriate only when both the independence and normality conditions are satisfied. Ensuring these assumptions supports the accuracy of the resulting interval and maintains the interpretability of the estimate.
FAQ
There is no fixed numerical threshold for identifying strong skewness. Instead, examiners expect students to use reasoned judgement based on the overall shape.
You should treat skewness as potentially problematic when:
• Most observations cluster at one end with a long, stretched tail
• The tail contains several influential values
• The distribution departs visibly from symmetry even after removing obvious outliers
Mild skewness is generally acceptable, but clear imbalance or tail heaviness warrants caution.
Yes. Each sample contributes its own mean and variability to the statistic, so the suitability of each must be examined independently.
A failure of normality in one sample can undermine the validity of the entire procedure, even if the other sample is perfectly acceptable.
Checking both ensures that the combined distribution of the difference in sample means behaves appropriately.
You should first determine whether the outlier is genuine or due to an error.
If the outlier represents a data entry mistake or clear measurement issue, removing it is justifiable.
If it is a legitimate observation, assess the remaining distribution:
• If skewness remains strong, proceed with caution
• If skewness becomes mild, the t-interval may be reasonable
Always justify your choice based on context and data quality.
Without replacement, selecting one individual affects which individuals remain available, creating subtle dependence between observations.
The 10% condition ensures that the sample is small enough relative to the population that removing each selected unit barely changes population composition.
As a result, the dependence becomes negligible, and the sample behaves as though drawn independently.
If the histogram shows no extreme outliers, has a clear central peak, and tails that taper relatively evenly, this supports the use of a t-interval.
You may strengthen your justification by noting:
• Moderate deviations from symmetry are unlikely to distort the sampling distribution
• The sample size is reasonably large even if below 30
• Contextual factors suggest that the underlying population is not heavily skewed
Such arguments typically suffice in exam settings when supported by a clear description of the shape.
Practice Questions
Question 1 (1–3 marks)
A researcher selects two independent random samples to compare the mean time students spend on homework in School A and School B. The sample sizes are 18 for School A and 22 for School B. Both samples were taken without replacement from schools with populations of roughly 600 students each.
(a) State whether the independence condition is satisfied, referring to the 10% condition.
(b) Give one reason why the normality condition may require additional checking for School A.
Question 1
(a) 1 mark for correctly stating that the independence condition is satisfied because each sample is less than 10% of its population (18/600 and 22/600 both clearly below 0.10).
(b) 1–2 marks for noting that a sample size of 18 is under 30, so the shape of the sample distribution must be checked for skewness or outliers. Full marks require explicit mention that normality cannot be assumed solely from sample size.
Question 2 (4–6 marks)
A study compares the mean daily screen time of teenagers in two different regions. The researchers collect two independent samples:
• Region 1: n = 25, distribution is moderately right-skewed
• Region 2: n = 45, distribution is approximately symmetric
The samples are taken randomly and without replacement from very large populations.
(a) Assess whether the independence condition is met for these data.
(b) Evaluate whether the normality condition is sufficiently satisfied to proceed with a two-sample confidence interval for the difference in population means.
(c) Explain how the smaller sample size of Region 1 affects the normality assessment compared with Region 2.
Question 2
(a) 1–2 marks for stating that independence is satisfied because the samples are random and taken from very large populations, making the 10% condition easily met.
(b) 2–3 marks for evaluating normality:
• Region 2 is acceptable due to larger n (45) and approximately symmetric data.
• Region 1 requires caution because n = 25 (below 30) and moderately right-skewed data mean normality cannot be assumed; however, inference may still be reasonable if skewness is not severe.
Award full marks for clear comparison and justification.
(c) 1 mark for explaining that Region 1's smaller sample size makes the procedure more sensitive to skewness, while Region 2's larger sample size allows the Central Limit Theorem to support approximate normality even with mild asymmetry.
