Residual plots are used to assess whether or not the residuals in a regression model are normally distributed and whether or not they exhibit heteroscedasticity.
Ideally, you would like the points in a residual plot to be randomly scattered around a value of zero with no clear pattern.
If you encounter a residual plot where the points in the plot have a curved pattern, it likely means that the regression model you have specified for the data is not correct.
In most cases, it means that you attempted to fit a linear regression model to a dataset that instead follows a quadratic trend.
The following example shows how to interpret (and fix) a curved residual plot in practice.
Example: Interpreting a Curved Residual Plot
Suppose we collect the following data on the number of hours worked per week and the reported happiness level (on a scale of 0-100) for 11 different people in some office:
If we create a simple scatter plot of hours worked vs. happiness level, here’s what it would look like:
Now suppose we would like to fit a regression model using hours worked to predict happiness level.
The following code shows how fit a simple linear regression model to this dataset and produce a residual plot in R:
#create dataframe df frame(hours=c(6, 9, 12, 14, 30, 35, 40, 47, 51, 55, 60), happiness=c(14, 28, 50, 70, 89, 94, 90, 75, 59, 44, 27)) #fit linear regression model linear_model #get list of residuals res #produce residual vs. fitted plot plot(fitted(linear_model), res, xlab='Fitted Values', ylab='Residuals') #add a horizontal line at 0 abline(0,0)
The x-axis displays the fitted values and the y-axis displays the residuals.
From the plot we can see that there is a curved pattern in the residuals, which indicates that a linear regression model does not provide an appropriate fit to this dataset.
The following code shows how fit a quadratic regression model to this dataset and produce a residual plot in R:
#create dataframe df frame(hours=c(6, 9, 12, 14, 30, 35, 40, 47, 51, 55, 60), happiness=c(14, 28, 50, 70, 89, 94, 90, 75, 59, 44, 27)) #define quadratic term to use in model df$hours2 #fit quadratic regression model quadratic_model #get list of residuals res #produce residual vs. fitted plot plot(fitted(quadratic_model), res, xlab='Fitted Values', ylab='Residuals') #add a horizontal line at 0 abline(0,0)
Once again the x-axis displays the fitted values and the y-axis displays the residuals.
From the plot we can see that the residuals are randomly scattered around zero and there is no clear pattern in the residuals.
This tells us that a quadratic regression model does a much better job of fitting this dataset compared to a linear regression model.
This should make sense considering we saw that the true relationship between hours worked and happiness level appeared to be quadratic instead of linear.
Additional Resources
The following tutorials explain how to create residual plots using different statistical software:
How to Create a Residual Plot by Hand
How to Create a Residual Plot in R
How to Create a Residual Plot in Excel
How to Create a Residual Plot in Python