Coefficient of Determination (2.8.3) | AP Statistics Notes

AP Syllabus focus: 'In simple linear regression, r squared is the coefficient of determination: the proportion of response-variable variation explained by the explanatory variable.'

In simple linear regression, one key question is how much of the response variable’s variability is accounted for by the explanatory variable. The coefficient of determination gives that information directly.

Understanding the Coefficient of Determination

In AP Statistics, the coefficient of determination is written as $r^2$ and is used with a simple linear regression model. It tells you how much of the variability in the response variable can be explained by its linear relationship with the explanatory variable.

Coefficient of determination: In simple linear regression, $r^2$ is the proportion of variation in the response variable explained by the explanatory variable.

The word variation refers to how much the response values differ from one another. A regression model tries to account for part of that difference by using the explanatory variable. If the linear model fits well, a larger share of the response variable’s variation is explained. If the fit is weaker, more of the variation remains unexplained.

Three scatterplots with regression lines illustrate that $r^2$ increases as points lie closer to the least-squares line. The visual message matches the interpretation of $r^2$ as the proportion of variation in the response variable explained by its linear relationship with the explanatory variable. Source

When AP questions ask for an interpretation of $r^2$ , they are asking about the response variable, not the explanatory variable. This is one of the most important points to keep clear.

Explained and unexplained variation

The coefficient of determination is a proportion, so it can be written as a decimal or expressed as a percent. A value closer to $1$ means the model explains more of the response variable’s variation. A value closer to $0$ means the model explains less.

$Explained\ Proportion = r^2$

$Explained\ Proportion$ = proportion of variation in the response variable explained by the explanatory variable, unitless

$r^2$ = coefficient of determination, unitless

$Unexplained\ Proportion = 1-r^2$

$Unexplained\ Proportion$ = proportion of variation in the response variable not explained by the linear model, unitless

Because $r^2$ is a proportion, it has no units.

A scatterplot with a least-squares regression line highlights one point and its vertical residual (observed $y$ minus predicted $\hat{y}$ ). This picture connects “unexplained variation” to the vertical deviations of data points from the regression line. Source

It does not describe individual observations one by one. Instead, it summarizes how much of the overall variation in the response variable is associated with the linear model.

Interpreting $r^2$ in Context

A correct AP interpretation of $r^2$ should include:

the percentage or proportion explained
the response variable
the explanatory variable
the idea of a linear relationship or linear model

A strong interpretation follows this pattern: About $100r^2%$ of the variation in [response variable] is explained by the linear relationship with [explanatory variable].

Be careful with wording. The phrase “variation in the response variable” is essential. Saying that $r^2$ is the percent of points on the line, the percent of correct predictions, or the percent caused by the explanatory variable is not correct.

What $r^2$ does and does not tell you

A larger $r^2$ means the linear model accounts for more of the response variable’s variability in the data set. This makes $r^2$ useful for describing how well the explanatory variable helps account for changes in the response variable.

A smaller $r^2$ means the model explains less of that variability. That does not automatically make the model useless. In many real settings, response variables are influenced by many factors, so even a moderate explained proportion may still be meaningful in context.

Key Properties of $r^2$

Several properties of the coefficient of determination are important for AP Statistics:

In simple linear regression, $r^2$ is between $0$ and $1$ .
Since it is a square, $r^2$ cannot be negative.
It describes explained variation in the response variable, not in the explanatory variable.
It is specifically tied to the linear regression model.

Because $r^2$ is squared, it does not show whether the relationship is positive or negative. It measures how much of the variation is explained, not the direction of the relationship.

Common Mistakes to Avoid

Students often lose points by interpreting $r^2$ too loosely. Avoid these errors:

saying the explanatory variable explains $r^2$ of the observations
saying $r^2%$ of the response variable is caused by the explanatory variable
treating $r^2$ as a measure of slope
forgetting to mention the response variable
leaving out the word variation

Another common mistake is giving a definition with no context. On the AP exam, your interpretation should name the actual variables in the problem. The wording should match the situation being studied.

Writing AP-Ready Interpretations

When you see $r^2$ in a regression setting, move through these steps:

identify the response variable
identify the explanatory variable
convert $r^2$ to a percent if that makes the interpretation clearer
state that this percent of the variation in the response variable is explained by the linear relationship with the explanatory variable

This approach keeps the interpretation focused exactly where AP Statistics wants it: on the proportion of response-variable variation explained by the explanatory variable.

FAQ

In simple linear regression with one explanatory variable and a least-squares regression line that includes an intercept, yes: the coefficient of determination equals the square of the correlation coefficient.

That shortcut does not automatically extend to more complicated regression settings. In AP Statistics, you usually work in the simple case, so $r^2$ and $r$ are directly connected there.

If you only change units, such as inches to centimeters or pounds to kilograms, you are just rescaling the axes. That does not change the proportion of response-variable variation explained by the linear model.

So the numerical value of $r^2$ stays the same under ordinary unit conversions. A different result happens only if you change the variables more substantially, such as using a nonlinear transformation.

Yes. Two data sets can have the same $r^2$ even if they look quite different.

For example, they may differ in:

sample size
clustering
unusual points
overall shape

This is why $r^2$ should not be the only description of a regression relationship. It captures one feature of model performance, not the full visual pattern of the data.

When the explanatory variable only takes values in a narrow interval, there may be less visible change in the response variable across the observed data. That can make the model explain a smaller share of the total variation.

So a study with a restricted range of $x$ values may produce a lower $r^2$ even if the same underlying linear relationship exists in a broader population.

Conceptually, $r^2$ compares the regression model with a very basic baseline: predicting every response value using the mean of the response variable.

If the regression model improves a lot on that baseline, $r^2$ is larger. If it improves only a little, $r^2$ is smaller.

So $r^2$ is really about how much better the linear model explains response variation than a no-relationship prediction based only on the mean.

Practice Questions

A simple linear regression model uses number of class absences to predict final exam score. The computer output reports $r^2 = 0.49$ .

Interpret $r^2$ in context.

1 mark: States that $49%$ (or $0.49$ ) of the variation in final exam scores is explained.
1 mark: Clearly connects the explanation to the linear relationship with number of class absences.

A simple linear regression model is used to predict monthly heating cost from average outdoor temperature. The regression output shows $r^2 = 0.64$ .

(a) Interpret $r^2$ in context. (2 marks)

(b) What proportion of the variation in monthly heating cost is not explained by the model? (1 mark)

(c) A student says, “Because $r^2 = 0.64$ , temperature determines $64%$ of a home’s heating cost.” Explain why this statement is incorrect. (2 marks)

(a) 1 mark: States that $64%$ of the variation in monthly heating cost is explained.
(a) 1 mark: Links the explanation to the linear relationship with average outdoor temperature.
(b) 1 mark: Gives $1 - 0.64 = 0.36$ or $36%$ .
(c) 1 mark: Explains that $r^2$ refers to explained variation, not exact determination of individual costs.
(c) 1 mark: Explains that $r^2$ does not justify a causation claim.

Try All Topic Practice Questions

Written by:

Dr Rahil Sachak-Patwa

Oxford University - PhD Mathematics

Rahil spent ten years working as private tutor, teaching students for GCSEs, A-Levels, and university admissions. During his PhD he published papers on modelling infectious disease epidemics and was a tutor to undergraduate and masters students for mathematics courses.

Oxford University - PhD Mathematics

AP Statistics study notes

2.8.3 Coefficient of Determination

Understanding the Coefficient of Determination

Explained and unexplained variation

Interpreting $r^2$ in Context

What $r^2$ does and does not tell you

Key Properties of $r^2$

Common Mistakes to Avoid

Writing AP-Ready Interpretations

FAQ

Practice Questions

Hire a tutor

AP Statistics study notes

2.8.3 Coefficient of Determination

Understanding the Coefficient of Determination

Explained and unexplained variation

Interpreting r2r^2r2 in Context

What r2r^2r2 does and does not tell you

Key Properties of r2r^2r2

Common Mistakes to Avoid

Writing AP-Ready Interpretations

FAQ

Practice Questions

Hire a tutor

Interpreting $r^2$ in Context

What $r^2$ does and does not tell you

Key Properties of $r^2$