A residual is the difference between an observed value and a predicted value in a regression model.
It is calculated as:
Residual = Observed value – Predicted value
One way to understand how well a regression model fits a dataset is to calculate the residual sum of squares, which is calculated as:
Residual sum of squares = Σ(ei)2
where:
- Σ: A Greek symbol that means “sum”
- ei: The ith residual
The lower the value, the better a model fits a dataset.
This tutorial provides a step-by-step example of how to calculate the residual sum of squares for a regression model in Python.
Step 1: Enter the Data
For this example we’ll enter data for the number of hours spent studying, total prep exams taken, and exam score received by 14 different students:
import pandas as pd #create DataFrame df = pd.DataFrame({'hours': [1, 2, 2, 4, 2, 1, 5, 4, 2, 4, 4, 3, 6, 5], 'exams': [1, 3, 3, 5, 2, 2, 1, 1, 0, 3, 4, 3, 2, 4], 'score': [76, 78, 85, 88, 72, 69, 94, 94, 88, 92, 90, 75, 96, 90]})
Step 2: Fit the Regression Model
Next, we’ll use the OLS() function from the statsmodels library to perform ordinary least squares regression, using “hours” and “exams” as the predictor variables and “score” as the response variable:
import statsmodels.api as sm
#define response variable
y = df['score']
#define predictor variables
x = df[['hours', 'exams']]
#add constant to predictor variables
x = sm.add_constant(x)
#fit linear regression model
model = sm.OLS(y, x).fit()
#view model summary
print(model.summary())
OLS Regression Results
==============================================================================
Dep. Variable: score R-squared: 0.722
Model: OLS Adj. R-squared: 0.671
Method: Least Squares F-statistic: 14.27
Date: Sat, 02 Jan 2021 Prob (F-statistic): 0.000878
Time: 15:58:35 Log-Likelihood: -41.159
No. Observations: 14 AIC: 88.32
Df Residuals: 11 BIC: 90.24
Df Model: 2
Covariance Type: nonrobust
==============================================================================
coef std err t P>|t| [0.025 0.975]
------------------------------------------------------------------------------
const 71.8144 3.680 19.517 0.000 63.716 79.913
hours 5.0318 0.942 5.339 0.000 2.958 7.106
exams -1.3186 1.063 -1.240 0.241 -3.658 1.021
==============================================================================
Omnibus: 0.976 Durbin-Watson: 1.270
Prob(Omnibus): 0.614 Jarque-Bera (JB): 0.757
Skew: -0.245 Prob(JB): 0.685
Kurtosis: 1.971 Cond. No. 12.1
==============================================================================
Step 3: Calculate the Residual Sum of Squares
We can use the following code to calculate the residual sum of squares for the model:
print(model.ssr)
293.25612951525414
The residual sum of squares turns out to be 293.256.
Additional Resources
How to Perform Simple Linear Regression in Python
How to Perform Multiple Linear Regression in Python
Residual Sum of Squares Calculator