TutorChase logo
Login
AP Statistics study notes

2.8.3 Coefficient of Determination

AP Syllabus focus: 'In simple linear regression, r squared is the coefficient of determination: the proportion of response-variable variation explained by the explanatory variable.'

In simple linear regression, one key question is how much of the response variable’s variability is accounted for by the explanatory variable. The coefficient of determination gives that information directly.

Understanding the Coefficient of Determination

In AP Statistics, the coefficient of determination is written as r2r^2 and is used with a simple linear regression model. It tells you how much of the variability in the response variable can be explained by its linear relationship with the explanatory variable.

Coefficient of determination: In simple linear regression, r2r^2 is the proportion of variation in the response variable explained by the explanatory variable.

The word variation refers to how much the response values differ from one another. A regression model tries to account for part of that difference by using the explanatory variable. If the linear model fits well, a larger share of the response variable’s variation is explained. If the fit is weaker, more of the variation remains unexplained.

Pasted image

Three scatterplots with regression lines illustrate that r2r^2 increases as points lie closer to the least-squares line. The visual message matches the interpretation of r2r^2 as the proportion of variation in the response variable explained by its linear relationship with the explanatory variable. Source

When AP questions ask for an interpretation of r2r^2, they are asking about the response variable, not the explanatory variable. This is one of the most important points to keep clear.

Explained and unexplained variation

The coefficient of determination is a proportion, so it can be written as a decimal or expressed as a percent. A value closer to 11 means the model explains more of the response variable’s variation. A value closer to 00 means the model explains less.

Explained Proportion=r2Explained\ Proportion = r^2

Explained ProportionExplained\ Proportion = proportion of variation in the response variable explained by the explanatory variable, unitless

r2r^2 = coefficient of determination, unitless

Unexplained Proportion=1r2Unexplained\ Proportion = 1-r^2

Unexplained ProportionUnexplained\ Proportion = proportion of variation in the response variable not explained by the linear model, unitless

Because r2r^2 is a proportion, it has no units.

Pasted image

A scatterplot with a least-squares regression line highlights one point and its vertical residual (observed yy minus predicted y^\hat{y}). This picture connects “unexplained variation” to the vertical deviations of data points from the regression line. Source

It does not describe individual observations one by one. Instead, it summarizes how much of the overall variation in the response variable is associated with the linear model.

Interpreting r2r^2 in Context

A correct AP interpretation of r2r^2 should include:

  • the percentage or proportion explained

  • the response variable

  • the explanatory variable

  • the idea of a linear relationship or linear model

A strong interpretation follows this pattern: About 100r2100r^2% of the variation in [response variable] is explained by the linear relationship with [explanatory variable].

Be careful with wording. The phrase “variation in the response variable” is essential. Saying that r2r^2 is the percent of points on the line, the percent of correct predictions, or the percent caused by the explanatory variable is not correct.

What r2r^2 does and does not tell you

A larger r2r^2 means the linear model accounts for more of the response variable’s variability in the data set. This makes r2r^2 useful for describing how well the explanatory variable helps account for changes in the response variable.

A smaller r2r^2 means the model explains less of that variability. That does not automatically make the model useless. In many real settings, response variables are influenced by many factors, so even a moderate explained proportion may still be meaningful in context.

Key Properties of r2r^2

Several properties of the coefficient of determination are important for AP Statistics:

  • In simple linear regression, r2r^2 is between 00 and 11.

  • Since it is a square, r2r^2 cannot be negative.

  • It describes explained variation in the response variable, not in the explanatory variable.

  • It is specifically tied to the linear regression model.

Because r2r^2 is squared, it does not show whether the relationship is positive or negative. It measures how much of the variation is explained, not the direction of the relationship.

Common Mistakes to Avoid

Students often lose points by interpreting r2r^2 too loosely. Avoid these errors:

  • saying the explanatory variable explains r2r^2 of the observations

  • saying r2r^2% of the response variable is caused by the explanatory variable

  • treating r2r^2 as a measure of slope

  • forgetting to mention the response variable

  • leaving out the word variation

Another common mistake is giving a definition with no context. On the AP exam, your interpretation should name the actual variables in the problem. The wording should match the situation being studied.

Writing AP-Ready Interpretations

When you see r2r^2 in a regression setting, move through these steps:

  • identify the response variable

  • identify the explanatory variable

  • convert r2r^2 to a percent if that makes the interpretation clearer

  • state that this percent of the variation in the response variable is explained by the linear relationship with the explanatory variable

This approach keeps the interpretation focused exactly where AP Statistics wants it: on the proportion of response-variable variation explained by the explanatory variable.

FAQ

In simple linear regression with one explanatory variable and a least-squares regression line that includes an intercept, yes: the coefficient of determination equals the square of the correlation coefficient.

That shortcut does not automatically extend to more complicated regression settings. In AP Statistics, you usually work in the simple case, so $r^2$ and $r$ are directly connected there.

If you only change units, such as inches to centimeters or pounds to kilograms, you are just rescaling the axes. That does not change the proportion of response-variable variation explained by the linear model.

So the numerical value of $r^2$ stays the same under ordinary unit conversions. A different result happens only if you change the variables more substantially, such as using a nonlinear transformation.

Yes. Two data sets can have the same $r^2$ even if they look quite different.

For example, they may differ in:

  • sample size

  • clustering

  • unusual points

  • overall shape

This is why $r^2$ should not be the only description of a regression relationship. It captures one feature of model performance, not the full visual pattern of the data.

When the explanatory variable only takes values in a narrow interval, there may be less visible change in the response variable across the observed data. That can make the model explain a smaller share of the total variation.

So a study with a restricted range of $x$ values may produce a lower $r^2$ even if the same underlying linear relationship exists in a broader population.

Conceptually, $r^2$ compares the regression model with a very basic baseline: predicting every response value using the mean of the response variable.

If the regression model improves a lot on that baseline, $r^2$ is larger. If it improves only a little, $r^2$ is smaller.

So $r^2$ is really about how much better the linear model explains response variation than a no-relationship prediction based only on the mean.

Practice Questions

A simple linear regression model uses number of class absences to predict final exam score. The computer output reports r2=0.49r^2 = 0.49.

Interpret r2r^2 in context.

  • 1 mark: States that 4949% (or 0.490.49) of the variation in final exam scores is explained.

  • 1 mark: Clearly connects the explanation to the linear relationship with number of class absences.

A simple linear regression model is used to predict monthly heating cost from average outdoor temperature. The regression output shows r2=0.64r^2 = 0.64.

(a) Interpret r2r^2 in context. (2 marks)

(b) What proportion of the variation in monthly heating cost is not explained by the model? (1 mark)

(c) A student says, “Because r2=0.64r^2 = 0.64, temperature determines 6464% of a home’s heating cost.” Explain why this statement is incorrect. (2 marks)

  • (a) 1 mark: States that 6464% of the variation in monthly heating cost is explained.

  • (a) 1 mark: Links the explanation to the linear relationship with average outdoor temperature.

  • (b) 1 mark: Gives 10.64=0.361 - 0.64 = 0.36 or 3636%.

  • (c) 1 mark: Explains that r2r^2 refers to explained variation, not exact determination of individual costs.

  • (c) 1 mark: Explains that r2r^2 does not justify a causation claim.

Hire a tutor

Please fill out the form and we'll find a tutor for you.

1/2
Your details
Alternatively contact us via
WhatsApp, Phone Call, or Email