Home » How to Interpret a Curved Residual Plot (With Example)

How to Interpret a Curved Residual Plot (With Example)

by Tutor Aspire

Residual plots are used to assess whether or not the residuals in a regression model are normally distributed and whether or not they exhibit heteroscedasticity.

Ideally, you would like the points in a residual plot to be randomly scattered around a value of zero with no clear pattern.

If you encounter a residual plot where the points in the plot have a curved pattern, it likely means that the regression model you have specified for the data is not correct.

In most cases, it means that you attempted to fit a linear regression model to a dataset that instead follows a quadratic trend.

The following example shows how to interpret (and fix) a curved residual plot in practice.

Example: Interpreting a Curved Residual Plot

Suppose we collect the following data on the number of hours worked per week and the reported happiness level (on a scale of 0-100) for 11 different people in some office:

If we create a simple scatter plot of hours worked vs. happiness level, here’s what it would look like:

Now suppose we would like to fit a regression model using hours worked to predict happiness level.

The following code shows how fit a simple linear regression model to this dataset and produce a residual plot in R:

#create dataframe
df frame(hours=c(6, 9, 12, 14, 30, 35, 40, 47, 51, 55, 60),
                 happiness=c(14, 28, 50, 70, 89, 94, 90, 75, 59, 44, 27))
#fit linear regression model
linear_model #get list of residuals 
res #produce residual vs. fitted plot
plot(fitted(linear_model), res, xlab='Fitted Values', ylab='Residuals')

#add a horizontal line at 0 
abline(0,0)

curved residual plot

The x-axis displays the fitted values and the y-axis displays the residuals.

From the plot we can see that there is a curved pattern in the residuals, which indicates that a linear regression model does not provide an appropriate fit to this dataset.

The following code shows how fit a quadratic regression model to this dataset and produce a residual plot in R:

#create dataframe
df frame(hours=c(6, 9, 12, 14, 30, 35, 40, 47, 51, 55, 60),
                 happiness=c(14, 28, 50, 70, 89, 94, 90, 75, 59, 44, 27))
#define quadratic term to use in model
df$hours2 #fit quadratic regression model
quadratic_model #get list of residuals 
res #produce residual vs. fitted plot
plot(fitted(quadratic_model), res, xlab='Fitted Values', ylab='Residuals')

#add a horizontal line at 0 
abline(0,0)

Once again the x-axis displays the fitted values and the y-axis displays the residuals.

From the plot we can see that the residuals are randomly scattered around zero and there is no clear pattern in the residuals.

This tells us that a quadratic regression model does a much better job of fitting this dataset compared to a linear regression model.

This should make sense considering we saw that the true relationship between hours worked and happiness level appeared to be quadratic instead of linear.

Additional Resources

The following tutorials explain how to create residual plots using different statistical software:

How to Create a Residual Plot by Hand
How to Create a Residual Plot in R
How to Create a Residual Plot in Excel
How to Create a Residual Plot in Python

You may also like