AP Syllabus focus: 'In simple linear regression, r squared is the coefficient of determination: the proportion of response-variable variation explained by the explanatory variable.'
In simple linear regression, one key question is how much of the response variable’s variability is accounted for by the explanatory variable. The coefficient of determination gives that information directly.
Understanding the Coefficient of Determination
In AP Statistics, the coefficient of determination is written as and is used with a simple linear regression model. It tells you how much of the variability in the response variable can be explained by its linear relationship with the explanatory variable.
Coefficient of determination: In simple linear regression, is the proportion of variation in the response variable explained by the explanatory variable.
The word variation refers to how much the response values differ from one another. A regression model tries to account for part of that difference by using the explanatory variable. If the linear model fits well, a larger share of the response variable’s variation is explained. If the fit is weaker, more of the variation remains unexplained.

Three scatterplots with regression lines illustrate that increases as points lie closer to the least-squares line. The visual message matches the interpretation of as the proportion of variation in the response variable explained by its linear relationship with the explanatory variable. Source
When AP questions ask for an interpretation of , they are asking about the response variable, not the explanatory variable. This is one of the most important points to keep clear.
Explained and unexplained variation
The coefficient of determination is a proportion, so it can be written as a decimal or expressed as a percent. A value closer to means the model explains more of the response variable’s variation. A value closer to means the model explains less.
= proportion of variation in the response variable explained by the explanatory variable, unitless
= coefficient of determination, unitless
= proportion of variation in the response variable not explained by the linear model, unitless
Because is a proportion, it has no units.

A scatterplot with a least-squares regression line highlights one point and its vertical residual (observed minus predicted ). This picture connects “unexplained variation” to the vertical deviations of data points from the regression line. Source
It does not describe individual observations one by one. Instead, it summarizes how much of the overall variation in the response variable is associated with the linear model.
Interpreting in Context
A correct AP interpretation of should include:
the percentage or proportion explained
the response variable
the explanatory variable
the idea of a linear relationship or linear model
A strong interpretation follows this pattern: About of the variation in [response variable] is explained by the linear relationship with [explanatory variable].
Be careful with wording. The phrase “variation in the response variable” is essential. Saying that is the percent of points on the line, the percent of correct predictions, or the percent caused by the explanatory variable is not correct.
What does and does not tell you
A larger means the linear model accounts for more of the response variable’s variability in the data set. This makes useful for describing how well the explanatory variable helps account for changes in the response variable.
A smaller means the model explains less of that variability. That does not automatically make the model useless. In many real settings, response variables are influenced by many factors, so even a moderate explained proportion may still be meaningful in context.
Key Properties of
Several properties of the coefficient of determination are important for AP Statistics:
In simple linear regression, is between and .
Since it is a square, cannot be negative.
It describes explained variation in the response variable, not in the explanatory variable.
It is specifically tied to the linear regression model.
Because is squared, it does not show whether the relationship is positive or negative. It measures how much of the variation is explained, not the direction of the relationship.
Common Mistakes to Avoid
Students often lose points by interpreting too loosely. Avoid these errors:
saying the explanatory variable explains of the observations
saying of the response variable is caused by the explanatory variable
treating as a measure of slope
forgetting to mention the response variable
leaving out the word variation
Another common mistake is giving a definition with no context. On the AP exam, your interpretation should name the actual variables in the problem. The wording should match the situation being studied.
Writing AP-Ready Interpretations
When you see in a regression setting, move through these steps:
identify the response variable
identify the explanatory variable
convert to a percent if that makes the interpretation clearer
state that this percent of the variation in the response variable is explained by the linear relationship with the explanatory variable
This approach keeps the interpretation focused exactly where AP Statistics wants it: on the proportion of response-variable variation explained by the explanatory variable.
FAQ
In simple linear regression with one explanatory variable and a least-squares regression line that includes an intercept, yes: the coefficient of determination equals the square of the correlation coefficient.
That shortcut does not automatically extend to more complicated regression settings. In AP Statistics, you usually work in the simple case, so $r^2$ and $r$ are directly connected there.
If you only change units, such as inches to centimeters or pounds to kilograms, you are just rescaling the axes. That does not change the proportion of response-variable variation explained by the linear model.
So the numerical value of $r^2$ stays the same under ordinary unit conversions. A different result happens only if you change the variables more substantially, such as using a nonlinear transformation.
Yes. Two data sets can have the same $r^2$ even if they look quite different.
For example, they may differ in:
sample size
clustering
unusual points
overall shape
This is why $r^2$ should not be the only description of a regression relationship. It captures one feature of model performance, not the full visual pattern of the data.
When the explanatory variable only takes values in a narrow interval, there may be less visible change in the response variable across the observed data. That can make the model explain a smaller share of the total variation.
So a study with a restricted range of $x$ values may produce a lower $r^2$ even if the same underlying linear relationship exists in a broader population.
Conceptually, $r^2$ compares the regression model with a very basic baseline: predicting every response value using the mean of the response variable.
If the regression model improves a lot on that baseline, $r^2$ is larger. If it improves only a little, $r^2$ is smaller.
So $r^2$ is really about how much better the linear model explains response variation than a no-relationship prediction based only on the mean.
Practice Questions
A simple linear regression model uses number of class absences to predict final exam score. The computer output reports .
Interpret in context.
1 mark: States that (or ) of the variation in final exam scores is explained.
1 mark: Clearly connects the explanation to the linear relationship with number of class absences.
A simple linear regression model is used to predict monthly heating cost from average outdoor temperature. The regression output shows .
(a) Interpret in context. (2 marks)
(b) What proportion of the variation in monthly heating cost is not explained by the model? (1 mark)
(c) A student says, “Because , temperature determines of a home’s heating cost.” Explain why this statement is incorrect. (2 marks)
(a) 1 mark: States that of the variation in monthly heating cost is explained.
(a) 1 mark: Links the explanation to the linear relationship with average outdoor temperature.
(b) 1 mark: Gives or .
(c) 1 mark: Explains that refers to explained variation, not exact determination of individual costs.
(c) 1 mark: Explains that does not justify a causation claim.
